page

Dec 22, 2022

HowTo: Access SRA Data

 HowTo: Access SRA Data

https://github.com/ncbi/sra-tools/wiki/HowTo:-Access-SRA-Data 

 

use the tool prefetch included in the SRA Toolkit.

 

xample of prefetch usage:

$ prefetch SRR1482462
Maximum file size download limit is 20,971,520KB

2015-02-19T13:20:06 prefetch.2.4.4: 1) Downloading 'SRR1482462'...
2015-02-19T13:20:06 prefetch.2.4.4:  Downloading via fasp...
2015-02-19T13:20:32 prefetch.2.4.4:  fasp download succeed
2015-02-19T13:20:32 prefetch.2.4.4: 1) 'SRR1482462' was downloaded successfully
2015-02-19T13:20:35 prefetch.2.4.4: 'SRR1482462' has 22 dependencies
2015-02-19T13:20:36 prefetch.2.4.4: 2) Downloading 'ncbi-acc:NC_000067.5?vdb-ctx=refseq'...
2015-02-19T13:20:36 prefetch.2.4.4:  Downloading via fasp...
2015-02-19T13:20:41 prefetch.2.4.4:  fasp download succeed
2015-02-19T13:20:41 prefetch.2.4.4: 2) 'ncbi-acc:NC_000067.5?vdb-ctx=refseq' was downloaded successfully
2015-02-19T13:20:41 prefetch.2.4.4: 3) Downloading 'ncbi-acc:NC_000068.6?vdb-ctx=refseq'...
2015-02-19T13:20:41 prefetch.2.4.4:  Downloading via fasp...
2015-02-19T13:20:46 prefetch.2.4.4:  fasp download succeed
2015-02-19T13:20:46 prefetch.2.4.4: 3) 'ncbi-acc:NC_000068.6?vdb-ctx=refseq' was downloaded successfully
2015-02-19T13:20:46 prefetch.2.4.4: 4) Downloading 'ncbi-acc:NC_000069.5?vdb-ctx=refseq'...
2015-02-19T13:20:46 prefetch.2.4.4:  Downloading via fasp...
2015-02-19T13:20:51 prefetch.2.4.4:  fasp download succeed
2015-02-19T13:20:51 prefetch.2.4.4: 4) 'ncbi-acc:NC_000069.5?vdb-ctx=refseq' was downloaded successfully
...

As can be seen from the output above, prefetch performs several steps:

  1. check the size of the file being downloaded
    If the file is very large, prefetch must be given a higher download limit, e.g.:
    $ prefetch --max-size 100000000 SRR1482462

  2. download the requested file
    The file is downloaded using Aspera if available on your system, or HTTPS otherwise.

  3. put the file into its proper place
    The file is downloaded into your designated cache area. This permits VDB name resolution to work as designed.

  4. recursively download missing external reference sequences
    Most SRA files require additional sequence files in order to reconstruct original reads. prefetch ensures that you not only download the main file but all of its dependencies.

  5. access dbGaP encrypted data
    prefetch will make use of download and decryption keys that have been added to SRA Toolkit configuration to obtain authorization for the download in addition to performing all of the steps above. (N.B. In order to access dbGaP data, you will need to change directory or "cd" to the dbGaP project's workspace.)

prefetch will also operate on existing, previously downloaded files to recursively download any missing external reference sequences.