page

Mar 26, 2024

HiSeq 4000, NovaSeq multiplex sample issues

https://med.stanford.edu/gssc/hiseq4000issue.html

https://enseqlopedia.com/2016/12/index-mis-assignment-between-samples-on-hiseq-4000-and-x-ten/

 

 If free barcoded adapter / index primers are present in a multiplexed pool, the free adapter has the potential to prime and extend library molecules in the same lane during the clustering step.  This can result in mis-assignment of reads through index swapping.  This can cause errors in demultiplexing data, as reads from one sample have the potential to end up in the FASTQ files of a different sample.  The HiSeq 2000/2500 and MiSeq are less impacted due to their biochemistry and the geometry of the flow cell used.

 

The range of mis-assignment can vary significantly and is impacted by the following factors:

  • Amount of free adapter present in library
  • Storage conditions of library
  • Application or library prep workflow

 

Sample mis-assignment can potentially impact users depending on the experimental design and library prep workflow.  Illumina has been working on this issue internally and has developed a few suggested mitigation strategies to reduce index swaps, listed below:

During Library Construction:

  • Optimize your PCR or ligation step to avoid an excess of adapters or index primers.
  • For PCR dilute the index primers to adjust the insert to adapter / primer ratio.
  • Perform extra clean ups after this step.
  • PAGE purification seems to do a good job reducing indexing primers.
  • Purification columns are also an option.
  • Do extra clean ups of each individual library before pooling.
  • Use single use aliquoted adapters and primers.
  • Freeze individual libraries and pool prior to sequencing.

Pooling suggestions:

  • Use dual indexing strategies with unique barcodes on both ends. (Swapping would have to occur at both ends for read mis-assignment to occur)
  • Sequence or freeze created libraries pools as soon as possible.

Sequencing suggestions:

  • Use PhiX from third parties with unique indexing barcodes to determine swap frequency. (We will have begun to introduce PhiX with unique barcodes from SeqMatic for HiSeq 4000 runs.)
  • For methods highly sensitive to mis-assignment use HiSeq 2000/2500 or MiSeq instruments.

 

MiSeq, HiSeq, NovaSeq read output tables

 https://med.stanford.edu/gssc/services/sequencing1.html

 

Illumina Sequencing Services

  MiSeq MiSeq Micro HiSeq 4000 iSeq 100 NovaSeq 6000 SP NovaSeq 6000 S1 NovaSeq 6000 S2 NovaSeq 6000 S4
Run Time 4-56 hours 24 hours 2-4 days 9-17.5 hours 13-38 hours 13-25 hours 16-36 hours 36-44 hours
Maximum Output 15 Gb 1.2 Gb 1500 Gb 1.2 Gb 325-400 Gb 400-500 Gb 1000-1250 Gb 2400-3000 Gb
Average Read Output 22 - 25 million 4 million 250 - 400 million 4 million 325 - 400 million 750 - 800 million 1,650 - 2,050 million 2,000 - 2,500 million
Maximum Read Length 2 x 300 bp 2 x 150 bp 2 x 150 bp 2 x 150 bp 2 x 250 bp 2 x 150 bp 2 x 150 bp 2 x 150 bp

UMI (Unique Molecular Identifier)

 https://dnatech.genomecenter.ucdavis.edu/faqs/what-are-umis-and-why-are-they-used-in-high-throughput-sequencing/

 UMI

also known as 'Molecular Barcodes'

 Quantitative sequencing analysis

- can be used in removal of PCR duplicate 

Genomic variant detection

Mar 21, 2023

how to verify cudnn installation?

How to verify CuDNN installation?


The installation of CuDNN is just copying some files. Hence to check if CuDNN is installed (and which version you have), you only need to check those files.

Install CuDNN

Step 1: Register an nvidia developer account and download cudnn here (about 80 MB). You might need nvcc --version to get your cuda version.

Step 2: Check where your cuda installation is. For most people, it will be /usr/local/cuda/. You can check it with which nvcc.

Step 3: Copy the files:

$ cd folder/extracted/contents
$ sudo cp include/cudnn.h /usr/local/cuda/include
$ sudo cp lib64/libcudnn* /usr/local/cuda/lib64
$ sudo chmod a+r /usr/local/cuda/lib64/libcudnn*

Check version

You might have to adjust the path. See step 2 of the installation.

$ cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2

Notes

When you get an error like

F tensorflow/stream_executor/cuda/cuda_dnn.cc:427] could not set cudnn filter descriptor: CUDNN_STATUS_BAD_PARAM

with TensorFlow, you might consider using CuDNN v4 instead of v5.

Ubuntu users who installed it via apthttps://askubuntu.com/a/767270/10425

Dec 22, 2022

HowTo: Access SRA Data

 HowTo: Access SRA Data

https://github.com/ncbi/sra-tools/wiki/HowTo:-Access-SRA-Data 

 

use the tool prefetch included in the SRA Toolkit.

 

xample of prefetch usage:

$ prefetch SRR1482462
Maximum file size download limit is 20,971,520KB

2015-02-19T13:20:06 prefetch.2.4.4: 1) Downloading 'SRR1482462'...
2015-02-19T13:20:06 prefetch.2.4.4:  Downloading via fasp...
2015-02-19T13:20:32 prefetch.2.4.4:  fasp download succeed
2015-02-19T13:20:32 prefetch.2.4.4: 1) 'SRR1482462' was downloaded successfully
2015-02-19T13:20:35 prefetch.2.4.4: 'SRR1482462' has 22 dependencies
2015-02-19T13:20:36 prefetch.2.4.4: 2) Downloading 'ncbi-acc:NC_000067.5?vdb-ctx=refseq'...
2015-02-19T13:20:36 prefetch.2.4.4:  Downloading via fasp...
2015-02-19T13:20:41 prefetch.2.4.4:  fasp download succeed
2015-02-19T13:20:41 prefetch.2.4.4: 2) 'ncbi-acc:NC_000067.5?vdb-ctx=refseq' was downloaded successfully
2015-02-19T13:20:41 prefetch.2.4.4: 3) Downloading 'ncbi-acc:NC_000068.6?vdb-ctx=refseq'...
2015-02-19T13:20:41 prefetch.2.4.4:  Downloading via fasp...
2015-02-19T13:20:46 prefetch.2.4.4:  fasp download succeed
2015-02-19T13:20:46 prefetch.2.4.4: 3) 'ncbi-acc:NC_000068.6?vdb-ctx=refseq' was downloaded successfully
2015-02-19T13:20:46 prefetch.2.4.4: 4) Downloading 'ncbi-acc:NC_000069.5?vdb-ctx=refseq'...
2015-02-19T13:20:46 prefetch.2.4.4:  Downloading via fasp...
2015-02-19T13:20:51 prefetch.2.4.4:  fasp download succeed
2015-02-19T13:20:51 prefetch.2.4.4: 4) 'ncbi-acc:NC_000069.5?vdb-ctx=refseq' was downloaded successfully
...

As can be seen from the output above, prefetch performs several steps:

  1. check the size of the file being downloaded
    If the file is very large, prefetch must be given a higher download limit, e.g.:
    $ prefetch --max-size 100000000 SRR1482462

  2. download the requested file
    The file is downloaded using Aspera if available on your system, or HTTPS otherwise.

  3. put the file into its proper place
    The file is downloaded into your designated cache area. This permits VDB name resolution to work as designed.

  4. recursively download missing external reference sequences
    Most SRA files require additional sequence files in order to reconstruct original reads. prefetch ensures that you not only download the main file but all of its dependencies.

  5. access dbGaP encrypted data
    prefetch will make use of download and decryption keys that have been added to SRA Toolkit configuration to obtain authorization for the download in addition to performing all of the steps above. (N.B. In order to access dbGaP data, you will need to change directory or "cd" to the dbGaP project's workspace.)

prefetch will also operate on existing, previously downloaded files to recursively download any missing external reference sequences.

 

Nov 28, 2022

Awk If Statement Examples

 Awk If Statement Examples

 
if 
$ awk '{
if ($3 =="" || $4 == "" || $5 == "")
	print "score of the student",$1,"is missing";'
}' 
 
if else
$ awk '{
if ($3 >=80 && $4 >= 80 && $5 >= 80)
	print $0,"=>","Pass";
else
	print $0,"=>","Fail";
}

else if
$ cat calc_grade.awk
{
total=$3+$4+$5;
mean=total/3;
if ( mean >= 90 ) grade="A";
else if ( mean >= 80) grade ="B";
else if (mean >= 70) grade ="C";
else grade="D";

print $0,"=>",grade;
}
 $ awk -f calc_grade.awk student-recort
AAA 2111 70 80 75 => C
BBB 2123 60 55 40 => D
CCC 2212 40 42 => D
DDD 2313 88 98 91 => A
EEE 2411 30 45 => D 

Jul 1, 2022

UMI : Unique Molecular Identifier, What and Why?

What are UMIs and why are they used in high-throughput sequencing?

https://dnatech.genomecenter.ucdavis.edu/faqs/what-are-umis-and-why-are-they-used-in-high-throughput-sequencing/




 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Software:
UMI-Tools: https://github.com/CGATOxford/UMI-tools
zUMIs: https://github.com/sdparekh/zUMIs
fastp: https://github.com/OpenGene/fastp  (transfer of UMIs into read IDs)

 

Fu, Y., Wu, PH., Beane, T. et al. Elimination of PCR duplicates in RNA-seq and small RNA-seq using unique molecular identifiers. BMC Genomics 19, 531 (2018).

 https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-018-4933-1

A higher number of unique combinations can be achieved simply by increasing the number of random-nucleotide positions. The number of UMI combinations must be sufficiently large because as mentioned above, the chance that two cDNA molecules with identical sequences in the starting pool are tagged with the same UMI combination needs to be infinitesimally small.