NGS scrap

Mar 21, 2023

how to verify cudnn installation?

The installation of CuDNN is just copying some files. Hence to check if CuDNN is installed (and which version you have), you only need to check those files.

Install CuDNN

Step 1: Register an nvidia developer account and download cudnn here (about 80 MB). You might need nvcc --version to get your cuda version.

Step 2: Check where your cuda installation is. For most people, it will be /usr/local/cuda/. You can check it with which nvcc.

Step 3: Copy the files:

$ cd folder/extracted/contents
$ sudo cp include/cudnn.h /usr/local/cuda/include
$ sudo cp lib64/libcudnn* /usr/local/cuda/lib64
$ sudo chmod a+r /usr/local/cuda/lib64/libcudnn*

Check version

You might have to adjust the path. See step 2 of the installation.

$ cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2

Notes

When you get an error like

F tensorflow/stream_executor/cuda/cuda_dnn.cc:427] could not set cudnn filter descriptor: CUDNN_STATUS_BAD_PARAM

with TensorFlow, you might consider using CuDNN v4 instead of v5.

Ubuntu users who installed it via apt: https://askubuntu.com/a/767270/10425

Dec 22, 2022

HowTo: Access SRA Data

https://github.com/ncbi/sra-tools/wiki/HowTo:-Access-SRA-Data

use the tool prefetch included in the SRA Toolkit.

xample of prefetch usage:

$ prefetch SRR1482462
Maximum file size download limit is 20,971,520KB

2015-02-19T13:20:06 prefetch.2.4.4: 1) Downloading 'SRR1482462'...
2015-02-19T13:20:06 prefetch.2.4.4:  Downloading via fasp...
2015-02-19T13:20:32 prefetch.2.4.4:  fasp download succeed
2015-02-19T13:20:32 prefetch.2.4.4: 1) 'SRR1482462' was downloaded successfully
2015-02-19T13:20:35 prefetch.2.4.4: 'SRR1482462' has 22 dependencies
2015-02-19T13:20:36 prefetch.2.4.4: 2) Downloading 'ncbi-acc:NC_000067.5?vdb-ctx=refseq'...
2015-02-19T13:20:36 prefetch.2.4.4:  Downloading via fasp...
2015-02-19T13:20:41 prefetch.2.4.4:  fasp download succeed
2015-02-19T13:20:41 prefetch.2.4.4: 2) 'ncbi-acc:NC_000067.5?vdb-ctx=refseq' was downloaded successfully
2015-02-19T13:20:41 prefetch.2.4.4: 3) Downloading 'ncbi-acc:NC_000068.6?vdb-ctx=refseq'...
2015-02-19T13:20:41 prefetch.2.4.4:  Downloading via fasp...
2015-02-19T13:20:46 prefetch.2.4.4:  fasp download succeed
2015-02-19T13:20:46 prefetch.2.4.4: 3) 'ncbi-acc:NC_000068.6?vdb-ctx=refseq' was downloaded successfully
2015-02-19T13:20:46 prefetch.2.4.4: 4) Downloading 'ncbi-acc:NC_000069.5?vdb-ctx=refseq'...
2015-02-19T13:20:46 prefetch.2.4.4:  Downloading via fasp...
2015-02-19T13:20:51 prefetch.2.4.4:  fasp download succeed
2015-02-19T13:20:51 prefetch.2.4.4: 4) 'ncbi-acc:NC_000069.5?vdb-ctx=refseq' was downloaded successfully
...

As can be seen from the output above, prefetch performs several steps:

check the size of the file being downloaded
If the file is very large, prefetch must be given a higher download limit, e.g.:
$ prefetch --max-size 100000000 SRR1482462
download the requested file
The file is downloaded using Aspera if available on your system, or HTTPS otherwise.
put the file into its proper place
The file is downloaded into your designated cache area. This permits VDB name resolution to work as designed.
recursively download missing external reference sequences
Most SRA files require additional sequence files in order to reconstruct original reads. prefetch ensures that you not only download the main file but all of its dependencies.
access dbGaP encrypted data
prefetch will make use of download and decryption keys that have been added to SRA Toolkit configuration to obtain authorization for the download in addition to performing all of the steps above. (N.B. In order to access dbGaP data, you will need to change directory or "cd" to the dbGaP project's workspace.)

prefetch will also operate on existing, previously downloaded files to recursively download any missing external reference sequences.

Nov 28, 2022

Awk If Statement Examples

https://www.thegeekstuff.com/2010/02/awk-conditional-statements/

if

$ awk '{
if ($3 =="" || $4 == "" || $5 == "")
	print "score of the student",$1,"is missing";'
}'

if else

$ awk '{
if ($3 >=80 && $4 >= 80 && $5 >= 80)
	print $0,"=>","Pass";
else
	print $0,"=>","Fail";
}

else if

$ cat calc_grade.awk
{
total=$3+$4+$5;
mean=total/3;
if ( mean >= 90 ) grade="A";
else if ( mean >= 80) grade ="B";
else if (mean >= 70) grade ="C";
else grade="D";

print $0,"=>",grade;
}

 $ awk -f calc_grade.awk student-recort
AAA 2111 70 80 75 => C
BBB 2123 60 55 40 => D
CCC 2212 40 42 => D
DDD 2313 88 98 91 => A
EEE 2411 30 45 => D

Jul 1, 2022

UMI : Unique Molecular Identifier, What and Why?

What are UMIs and why are they used in high-throughput sequencing?

https://dnatech.genomecenter.ucdavis.edu/faqs/what-are-umis-and-why-are-they-used-in-high-throughput-sequencing/

Software:
UMI-Tools: https://github.com/CGATOxford/UMI-tools
zUMIs: https://github.com/sdparekh/zUMIs
fastp: https://github.com/OpenGene/fastp (transfer of UMIs into read IDs)

Fu, Y., Wu, PH., Beane, T. et al. Elimination of PCR duplicates in RNA-seq and small RNA-seq using unique molecular identifiers. BMC Genomics 19, 531 (2018).

https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-018-4933-1

A higher number of unique combinations can be achieved simply by increasing the number of random-nucleotide positions. The number of UMI combinations must be sufficiently large because as mentioned above, the chance that two cDNA molecules with identical sequences in the starting pool are tagged with the same UMI combination needs to be infinitesimally small.

cyvcf2 : cython + htslib built for fast parsing of Variant Call Format (VCF)

https://github.com/brentp/cyvcf2

cyvcf2 is a cython wrapper around htslib built for fast parsing of Variant Call Format (VCF) files.

Mar 31, 2022

Download FASTQ files from European Nucleotide Archive (ENA)

https://github.com/wwood/ena-fast-download

Requirements

aspera client : https://downloads.asperasoft.com/en/downloads/8?list
curl
Python 3

# set path for aspera. check your aspera directory path
PATH=$PATH:/home/lee/.aspera/connect/bin
export PATH

usage: ena-fast-download.py [-h] [--output_directory OUTPUT_DIRECTORY]
[--ssh_key SSH_KEY ( for OSX) ]
run_identifier

ena-fast-download.py --output_directory /output/directory ERR1739691

Babyplots : interactive 3D graphs

Babyplots Documentation

Babyplots is an easy to use library for creating interactive 3d graphs for exploring and presenting data.

Babyplots is available as a JavaScript library, as an R package, as a Python package, and as an add-in for Microsoft PowerPoint. While the R package, Python package and JavaScript library allow the creation of new plots, the PowerPoint add-in can only be used to display exported plots. This website also provides an interactive node-based editor for creating babyplots visualizations called NPC (node plot creator) or simply Creator.

Find the individual documentation pages through the links below:

Dragging from input nodes

https://bp.bleb.li/documentation/