NGS scrap: 2015

Nov 9, 2015

How To Download All Sra Samples At Once

using SQLlite3
https://edwards.sdsu.edu/research/getting-data-from-the-sra/

using linux wget
details in http://seqanswers.com/forums/archive/index.php/t-30625.html

wget -r -nd -nH ftp://file_address

for example,
wget -r -nd -nH ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByStudy/sra/SRP/SRP103/SRP103124

Address is changed : edit 20210720
https://sra-pub-run-odp.s3.amazonaws.com/sra/SRR11780909/SRR11780909

To download specific SRA files in the list

Store the SRR096001-SRR096999 #s that would want to download in a file:
for example:
cat > SRR_2_download
SRR096023
SRR096072
SRR096074

for i in $(cat SRR_2_download);do wget -r -nd -nH ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByRun/litesra/SRR/SRR096/$i/*; done

for i in $(cat SRR_2_download);do wget https://sra-pub-run-odp.s3.amazonaws.com/sra/$i/$i; done

This will download the specific SRA files that are listed in the file SRR_2_download

using R
details in www.biostars.org/p/93494/

Run R
source('http://bioconductor.org/biocLite.R')
biocLite('SRAdb')
library(SRAdb)
biocLite('DBI')
library(DBI)

srafile = getSRAdbFile() # Download & Unzip Last Version Of SRAmetadb.Sqlite.Gz From Server to working directory

# Once you download SRAmetadb.Sqlite.Gz, set SRAmetadb.sqlite file to variable srafile
# SRAmetadb.Sqlite.Gz is big. Re-download if you need updated version

srafile <- 'SRAmetadb.sqlite' 

con = dbConnect(RSQLite::SQLite(), srafile)
listSRAfile('SRP026197',con)
getSRAfile('SRP026197',con,fileType='sra')




## dump SRA file to fastq.gz  (required SRAtoolkit)
Run in linux
fastq-dump -O /output/dir --gzip ./SRR2047462.sra #sra file dump to SRR2047462.fastq.gz

library(GEOquery)
gse <- getGEO('GSE48138') # retrieves a GEO list set for your SRA id.
## see what is in there:
show(gse)
# There are 2 sets of samples for that ID
##  what you want is table a with SRR to download and some sample information:
## lets see what the first set contains:
df <- as.data.frame(gse[[1]])
head(df)

Sep 2, 2015

R : “Non Zero Exit Status” error in installation of 'XML' and 'RCurl'

see details
http://stackoverflow.com/questions/20671814/non-zero-exit-status-r-3-0-1-xml-and-rcurl

in short, you need to install curl, xml2 library on your OS

To install curl and xml on ubuntu. Run

sudo apt-get install libcurl4-openssl-dev


sudo apt-get install libxml2-dev

in CentOS 6+ you can do it using

sudo yum -y install curl
sudo yum -y install libcurl libcurl-devel
sudo yum -y install libxml2 libxml2-devel

Aug 5, 2015

add swap space

if you run softwares that require a large amount of memory, you need to add more swap space

You have three options: create a new swap partition, create a new swap file, or extend swap on an existing LVM2 logical volume. It is recommended that you extend an existing logical volume.

https://www.centos.org/docs/5/html/Deployment_Guide-en-US/s1-swap-adding.html

Aug 4, 2015

tar extract file

tar extract file

http://www.cyberciti.biz/faq/tar-extract-linux/

tar -xvf file.tar

tar -xzvf file.tar.gz

tar -xjvf file.tar.bz2

things to install for NGS analysis

NCURSE -programming library for text-based user interfaces
http://codybonney.com/installing-the-ncurses-library-in-centos-6-5/

sudo yum install ncurses-devel

Cutadapt - adapter sequence trimming
http://cutadapt.readthedocs.org/en/latest/index.html

Trimmomatic: A flexible read trimming tool for Illumina NGS data
http://www.usadellab.org/cms/?page=trimmomatic

FastQC - quality check of fastq file
http://www.bioinformatics.babraham.ac.uk/projects/fastqc/

Bowtie - short read aligner
http://bowtie-bio.sourceforge.net/index.shtml
http://bowtie-bio.sourceforge.net/bowtie2/index.shtml

Tophat - splice junction mapper for RNA-Seq reads (need Bowtie)
https://ccb.jhu.edu/software/tophat/index.shtml

samtools - tools for manipulating alignments in the SAM(Sequence Alignment/Map) format,
http://samtools.sourceforge.net/

picard - tools for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF
http://broadinstitute.github.io/picard/

HTSeq - Python package that provides infrastructure to process data from high-throughput sequencing assays
http://www-huber.embl.de/users/anders/HTSeq/doc/overview.html

MACS - Model-based Analysis for ChIP-Seq
http://liulab.dfci.harvard.edu/MACS/

PeakSplitter - Subdivision of ChIP-seq/ChIP-chip regions into discrete signal peaks
http://www.ebi.ac.uk/research/bertone/software

R - free software environment for statistical computing and graphics
https://www.r-project.org/

IGV - Integrative Genomics Viewer
https://www.broadinstitute.org/igv/
(log in required)

centOS 7 installation

see more details in
http://linoxide.com/how-tos/centos-7-step-by-step-screenshots/

1. download centOS 7 iso file
http://www.centos.org/download/
download DVD iso

* check md5checksum of download iso file to verify complete file download

2. download win32 disk imager (to make USB booting disk)
http://sourceforge.net/projects/win32diskimager/

3. burn iso image to USB with win32 disk imager (use 8G USB)

4. boot with USB (go BIOS and select USB for booting)

5. in "software selection"
select "GNOM Desktop" and "legacy X window system" in add-ons for selected environment

6. in "installation destination"
select "I will configure partitioning" and set partitioning

7. set root password & create user account
* remember root password !!!

8. complete installation and check reboot (remove USB booting disk before reboot)