page

Jun 19, 2017

SRA Toolkit

update : 20220916

update your SRA toolkit

https://github.com/ncbi/sra-tools/wiki

 

 

https://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=toolkit_doc

Frequently Used Tools:

fastq-dump: Convert SRA data into fastq format
prefetch: Allows command-line downloading of SRA, dbGaP, and ADSP data
sam-dump: Convert SRA data to sam format
sra-pileup: Generate pileup statistics on aligned SRA data
vdb-config: Display and modify VDB configuration information
vdb-decrypt: Decrypt non-SRA dbGaP data ("phenotype data")

Additional Tools:

abi-dump: Convert SRA data into ABI format (csfasta / qual)
illumina-dump: Convert SRA data into Illumina native formats (qseq, etc.)
sff-dump: Convert SRA data to sff format
sra-stat: Generate statistics about SRA data (quality distribution, etc.)
vdb-dump: Output the native VDB format of SRA data.
vdb-encrypt: Encrypt non-SRA dbGaP data ("phenotype data")
vdb-validate: Validate the integrity of downloaded SRA data

SRAtoolkit - fastq-dump

https://edwards.sdsu.edu/research/fastq-dump/

good review for how to use fastq-dump option


https://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=toolkit_doc&f=fastq-dump

fastq-dump: Convert SRA data into fastq format


Usage:
fastq-dump [options] <path/file> [<path/file> ...]
fastq-dump [options] <accession>
Frequently Used Options:
General:
-h|--helpDisplays ALL options, general usage, and version information.
-V|--versionDisplay the version of the program.
Data formatting:
--split-filesDump each read into separate file. Files will receive suffix corresponding to read number.
--split-spotSplit spots into individual reads.
--fasta <[line width]>FASTA only, no qualities. Optional line wrap width (set to zero for no wrapping).
-I|--readidsAppend read id after spot id as 'accession.spot.readid' on defline.
-F|--origfmtDefline contains only original sequence name.
-C|--dumpcs <[cskey]>Formats sequence using color space (default for SOLiD). "cskey" may be specified for translation.
-B|--dumpbaseFormats sequence using base space (default for other than SOLiD).
-Q|--offset <integer>Offset to use for ASCII quality scores. Default is 33 ("!").
Filtering:
-N|--minSpotId <rowid>Minimum spot id to be dumped. Use with "X" to dump a range.
-X|--maxSpotId <rowid>Maximum spot id to be dumped. Use with "N" to dump a range.
-M|--minReadLen <len>Filter by sequence length >= <len>
--skip-technicalDump only biological reads.
--alignedDump only aligned sequences. Aligned datasets only; see sra-stat.
--unalignedDump only unaligned sequences. Will dump all for unaligned datasets.
Workflow and piping:
-O|--outdir <path>Output directory, default is current working directory ('.').
-Z|--stdoutOutput to stdout, all split data become joined into single stream.
--gzipCompress output using gzip.
--bzip2Compress output using bzip2.
Use examples:
fastq-dump -X 5 -Z SRR390728
Prints the first five spots (-X 5) to standard out (-Z). This is a useful starting point for verifying other formatting options before dumping a whole file.
fastq-dump -I --split-files SRR390728
Produces two fastq files (--split-files) containing ".1" and ".2" read suffices (-I) for paired-end data.
fastq-dump --split-files --fasta 60 SRR390728
Produces two (--split-files) fasta files (--fasta) with 60 bases per line ("60" included after --fasta).
fastq-dump --split-files --aligned -Q 64 SRR390728
Produces two fastq files (--split-files) that contain only aligned reads (--aligned; Note: only for files submitted as aligned data), with a quality offset of 64 (-Q 64) Please see the documentation on vdb-dump if you wish to produce fasta/qual data.
Possible errors and their solution:
fastq-dump.2.x err: item not found while constructing within virtual database module - the path '<path/SRR*.sra>' cannot be opened as database or table
This error indicates that the .sra file cannot be found. Confirm that the path to the file is correct.
fastq-dump.2.x err: name not found while resolving tree within virtual file system module - failed SRR*.sra
The data are likely reference compressed and the toolkit is unable to acquire the reference sequence(s) needed to extract the .sra file. Please confirm that you have tested and validated the configuration of the toolkit. If you have elected to prevent the toolkit from contacting NCBI, you will need to manually acquire the reference(s) here

Jun 3, 2017

python - make a time delay

https://stackoverflow.com/questions/510348/how-can-i-make-a-time-delay-in-python


import time
time.sleep(5) # delays for 5 seconds
Here is another example where something is run once a minute:
import time 
while True:
    print "This prints once a minute."
    time.sleep(60)  # Delay for 1 minute (60 seconds)

python - os.system, subprocess to spawn new processes

https://docs.python.org/2/library/subprocess.html



https://stackoverflow.com/questions/18739239/python-how-to-get-stdout-after-running-os-system?noredirect=1&lq=1

https://stackoverflow.com/questions/3791465/python-os-system-for-command-line-call-linux-not-returning-what-it-should

import os
os.system('ls')
from subprocess import call
call('ls')