page

Nov 9, 2015

How To Download All Sra Samples At Once

using SQLlite3
https://edwards.sdsu.edu/research/getting-data-from-the-sra/



using linux wget 
details in http://seqanswers.com/forums/archive/index.php/t-30625.html

wget -r -nd -nH ftp://file_address

for example,
wget -r -nd -nH ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByStudy/sra/SRP/SRP103/SRP103124

Address is changed : edit 20210720
https://sra-pub-run-odp.s3.amazonaws.com/sra/SRR11780909/SRR11780909


To download specific SRA files in the list

Store the SRR096001-SRR096999 #s that would want to download in a file:
for example:
cat > SRR_2_download
SRR096023
SRR096072
SRR096074

for i in $(cat SRR_2_download);do wget -r -nd -nH ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByRun/litesra/SRR/SRR096/$i/*; done

for i in $(cat SRR_2_download);do wget https://sra-pub-run-odp.s3.amazonaws.com/sra/$i/$i; done

 
This will download the specific SRA files that are listed in the file SRR_2_download

using R
details in www.biostars.org/p/93494/

Run R
source('http://bioconductor.org/biocLite.R')
biocLite('SRAdb')
library(SRAdb)
biocLite('DBI')
library(DBI)

srafile = getSRAdbFile() # Download & Unzip Last Version Of SRAmetadb.Sqlite.Gz From Server to working directory

# Once you download SRAmetadb.Sqlite.Gz, set SRAmetadb.sqlite file to variable srafile
# SRAmetadb.Sqlite.Gz is big. Re-download if you need updated version

srafile <- 'SRAmetadb.sqlite' 

con = dbConnect(RSQLite::SQLite(), srafile)
listSRAfile('SRP026197',con)
getSRAfile('SRP026197',con,fileType='sra')




## dump SRA file to fastq.gz  (required SRAtoolkit)
Run in linux
fastq-dump -O /output/dir --gzip ./SRR2047462.sra #sra file dump to SRR2047462.fastq.gz


library(GEOquery)
gse <- getGEO('GSE48138') # retrieves a GEO list set for your SRA id.
## see what is in there:
show(gse)
# There are 2 sets of samples for that ID
##  what you want is table a with SRR to download and some sample information:
## lets see what the first set contains:
df <- as.data.frame(gse[[1]])
head(df)

Sep 2, 2015

R : “Non Zero Exit Status” error in installation of 'XML' and 'RCurl'

see details
http://stackoverflow.com/questions/20671814/non-zero-exit-status-r-3-0-1-xml-and-rcurl

in short, you need to install curl, xml2 library on your OS


To install curl and xml on ubuntu. Run
sudo apt-get install libcurl4-openssl-dev


sudo apt-get install libxml2-dev

in CentOS 6+ you can do it using
sudo yum -y install curl
sudo yum -y install libcurl libcurl-devel
sudo yum -y install libxml2 libxml2-devel

Aug 5, 2015

add swap space

if you run softwares that require a large amount of memory, you need to add more swap space

You have three options: create a new swap partition, create a new swap file, or extend swap on an existing LVM2 logical volume. It is recommended that you extend an existing logical volume.

https://www.centos.org/docs/5/html/Deployment_Guide-en-US/s1-swap-adding.html

Aug 4, 2015

tar extract file

tar extract file

http://www.cyberciti.biz/faq/tar-extract-linux/

tar -xvf file.tar
tar -xzvf file.tar.gz 
tar -xjvf file.tar.bz2

things to install for NGS analysis

NCURSE -programming library for text-based user interfaces
 http://codybonney.com/installing-the-ncurses-library-in-centos-6-5/

 sudo yum install ncurses-devel


Cutadapt - adapter sequence trimming
 http://cutadapt.readthedocs.org/en/latest/index.html

Trimmomatic: A flexible read trimming tool for Illumina NGS data
 http://www.usadellab.org/cms/?page=trimmomatic

FastQC - quality check of fastq file
 http://www.bioinformatics.babraham.ac.uk/projects/fastqc/

Bowtie - short read aligner
 http://bowtie-bio.sourceforge.net/index.shtml
 http://bowtie-bio.sourceforge.net/bowtie2/index.shtml

Tophat - splice junction mapper for RNA-Seq reads (need Bowtie)
 https://ccb.jhu.edu/software/tophat/index.shtml

samtools - tools for manipulating alignments in the SAM(Sequence Alignment/Map) format,
 http://samtools.sourceforge.net/

picard -  tools for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF
 http://broadinstitute.github.io/picard/

HTSeq - Python package that provides infrastructure to process data from high-throughput sequencing assays
 http://www-huber.embl.de/users/anders/HTSeq/doc/overview.html

MACS - Model-based Analysis for ChIP-Seq
 http://liulab.dfci.harvard.edu/MACS/

PeakSplitter - Subdivision of ChIP-seq/ChIP-chip regions into discrete signal peaks
 http://www.ebi.ac.uk/research/bertone/software

R -  free software environment for statistical computing and graphics
 https://www.r-project.org/

IGV - Integrative Genomics Viewer
 https://www.broadinstitute.org/igv/
 (log in required)

centOS 7 installation

see more details in
http://linoxide.com/how-tos/centos-7-step-by-step-screenshots/

1. download centOS 7 iso file
http://www.centos.org/download/
download DVD iso

 * check md5checksum of download iso file to verify complete  file download

2. download win32 disk imager (to make USB booting disk)
http://sourceforge.net/projects/win32diskimager/

3. burn iso image to USB with win32 disk imager  (use 8G USB)

4. boot with USB (go BIOS and select USB for booting)

5. in "software selection"
 select "GNOM Desktop"  and "legacy X window system" in add-ons for selected environment

6. in "installation destination"
 select "I will configure partitioning" and set partitioning

7. set root password & create user account
 * remember root password !!!

8. complete installation and check reboot (remove USB booting disk before reboot)

Dec 4, 2014

FTP command

ascii(as)
set transfer mode to ascii mode
binary(bi)
set transfer mode to binary mode
bye
exit ftp
chmod
change file permission
close
close connetion
delete
delete file
get
download file
hash
display file transer status with '#'
help
help
lcd
change working directory
ls
shows file list in the directory
mdelete
delete multiple files
mget
download multiple files
mput
upoad multiple files
open
ftp open
prompt
on/off prompt mode
put
upload file
pwd
show current working directory
quit
exit ftp
rstatus
shows current connection status of remote
rename
change file name
rmdir
remove directory
size
display file size as byte
status
shows current connection status
trace
on/off packet tracing
type
set file transfer mode
verbose
on/off detailed information
?
help
!
move to local computer without disconnect to remote computer. Type 'exit' to move to remote computer