data visualization with ggplot2
https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf
Aug 8, 2017
Aug 3, 2017
nohup - running program as background in linux
nohup ./workflow.sh & # running as background
output is saved to nohup.out
for terminate nohup
kill -9 PID number # you can check PID number with 'top' command
Installing tensorflow - pip upgrade
for details, check https://www.tensorflow.org/install/install_linux
if step 4 failed, try upgrade pip
if step 4 failed, try upgrade pip
(tensorflow)$ pip install --upgrade pip
then try step 4 again
Installing with virtualenv
Take the following steps to install TensorFlow with Virtualenv:
- Install pip and virtualenv by issuing one of the following commands:
- Create a virtualenv environment by issuing one of the following commands:where
targetDirectory
specifies the top of the virtualenv tree. Our instructions assume thattargetDirectory
is~/tensorflow
, but you may choose any directory. - Activate the virtualenv environment by issuing one of the following commands:The preceding source command should change your prompt to the following:
- Issue one of the following commands to install TensorFlow in the active virtualenv environment:If the preceding command succeeds, skip Step 5. If the preceding command fails, perform Step 5.
- (Optional) If Step 4 failed (typically because you invoked a pip version lower than 8.1), install TensorFlow in the active virtualenv environment by issuing a command of the following format:where
tfBinaryURL
identifies the URL of the TensorFlow Python package. The appropriate value oftfBinaryURL
depends on the operating system, Python version, and GPU support. Find the appropriate value fortfBinaryURL
for your system here. For example, if you are installing TensorFlow for Linux, Python 2.7, and CPU-only support, issue the following command to install TensorFlow in the active virtualenv environment:
Jun 19, 2017
SRA Toolkit
update : 20220916
update your SRA toolkit
https://github.com/ncbi/sra-tools/wiki
https://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=toolkit_doc
Frequently Used Tools:
fastq-dump: Convert SRA data into fastq format
prefetch: Allows command-line downloading of SRA, dbGaP, and ADSP data
sam-dump: Convert SRA data to sam format
sra-pileup: Generate pileup statistics on aligned SRA data
vdb-config: Display and modify VDB configuration information
vdb-decrypt: Decrypt non-SRA dbGaP data ("phenotype data")
Additional Tools:
abi-dump: Convert SRA data into ABI format (csfasta / qual)
illumina-dump: Convert SRA data into Illumina native formats (qseq, etc.)
sff-dump: Convert SRA data to sff format
sra-stat: Generate statistics about SRA data (quality distribution, etc.)
vdb-dump: Output the native VDB format of SRA data.
vdb-encrypt: Encrypt non-SRA dbGaP data ("phenotype data")
vdb-validate: Validate the integrity of downloaded SRA data
SRAtoolkit - fastq-dump
https://edwards.sdsu.edu/research/fastq-dump/
good review for how to use fastq-dump option
https://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=toolkit_doc&f=fastq-dump
fastq-dump: Convert SRA data into fastq format
good review for how to use fastq-dump option
https://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=toolkit_doc&f=fastq-dump
fastq-dump: Convert SRA data into fastq format
Usage:
fastq-dump [options] <path/file> [<path/file> ...]
fastq-dump [options] <accession>
Frequently Used Options:
General: | ||||
-h | | | --help | Displays ALL options, general usage, and version information. | |
-V | | | --version | Display the version of the program. | |
Data formatting: | ||||
--split-files | Dump each read into separate file. Files will receive suffix corresponding to read number. | |||
--split-spot | Split spots into individual reads. | |||
--fasta <[line width]> | FASTA only, no qualities. Optional line wrap width (set to zero for no wrapping). | |||
-I | | | --readids | Append read id after spot id as 'accession.spot.readid' on defline. | |
-F | | | --origfmt | Defline contains only original sequence name. | |
-C | | | --dumpcs <[cskey]> | Formats sequence using color space (default for SOLiD). "cskey" may be specified for translation. | |
-B | | | --dumpbase | Formats sequence using base space (default for other than SOLiD). | |
-Q | | | --offset <integer> | Offset to use for ASCII quality scores. Default is 33 ("!"). | |
Filtering: | ||||
-N | | | --minSpotId <rowid> | Minimum spot id to be dumped. Use with "X" to dump a range. | |
-X | | | --maxSpotId <rowid> | Maximum spot id to be dumped. Use with "N" to dump a range. | |
-M | | | --minReadLen <len> | Filter by sequence length >= <len> | |
--skip-technical | Dump only biological reads. | |||
--aligned | Dump only aligned sequences. Aligned datasets only; see sra-stat. | |||
--unaligned | Dump only unaligned sequences. Will dump all for unaligned datasets. | |||
Workflow and piping: | ||||
-O | | | --outdir <path> | Output directory, default is current working directory ('.'). | |
-Z | | | --stdout | Output to stdout, all split data become joined into single stream. | |
--gzip | Compress output using gzip. | |||
--bzip2 | Compress output using bzip2. |
Use examples:
Prints the first five spots (-X 5) to standard out (-Z). This is a useful starting point for verifying other formatting options before dumping a whole file.
Produces two fastq files (--split-files) containing ".1" and ".2" read suffices (-I) for paired-end data.
Produces two (--split-files) fasta files (--fasta) with 60 bases per line ("60" included after --fasta).
Produces two fastq files (--split-files) that contain only aligned reads (--aligned; Note: only for files submitted as aligned data), with a quality offset of 64 (-Q 64) Please see the documentation on vdb-dump if you wish to produce fasta/qual data.
fastq-dump -X 5 -Z SRR390728
fastq-dump -I --split-files SRR390728
fastq-dump --split-files --fasta 60 SRR390728
fastq-dump --split-files --aligned -Q 64 SRR390728
Possible errors and their solution:
This error indicates that the .sra file cannot be found. Confirm that the path to the file is correct.
The data are likely reference compressed and the toolkit is unable to acquire the reference sequence(s) needed to extract the .sra file. Please confirm that you have tested and validated the configuration of the toolkit. If you have elected to prevent the toolkit from contacting NCBI, you will need to manually acquire the reference(s) here
fastq-dump.2.x err: item not found while constructing within virtual database module - the path '<path/SRR*.sra>' cannot be opened as database or table
fastq-dump.2.x err: name not found while resolving tree within virtual file system module - failed SRR*.sra
Jun 3, 2017
python - make a time delay
https://stackoverflow.com/questions/510348/how-can-i-make-a-time-delay-in-python
import time
time.sleep(5) # delays for 5 seconds
Here is another example where something is run once a minute:
import time
while True:
print "This prints once a minute."
time.sleep(60) # Delay for 1 minute (60 seconds)
python - os.system, subprocess to spawn new processes
https://docs.python.org/2/library/subprocess.html
https://stackoverflow.com/questions/18739239/python-how-to-get-stdout-after-running-os-system?noredirect=1&lq=1
https://stackoverflow.com/questions/3791465/python-os-system-for-command-line-call-linux-not-returning-what-it-should
https://stackoverflow.com/questions/18739239/python-how-to-get-stdout-after-running-os-system?noredirect=1&lq=1
https://stackoverflow.com/questions/3791465/python-os-system-for-command-line-call-linux-not-returning-what-it-should
import os
os.system('ls')
from subprocess import call call('ls')
Subscribe to:
Posts (Atom)