page

Apr 29, 2021

SAMBAMBA-high performance parallel tool for working with SAM and BAM files

 https://github.com/biod/sambamba

 

SAMBAMBA-high performance highly parallel robust and fast tool for working with SAM and BAM files 

 

function: view, index, sort, markdup, and depth.

flagstat : 1.4x faster than samtools. 

Index : similar. 

Markdup : ~ 6x faster 

View :      ~4x faster

Sort  : Sambamba has been beaten, though sambamba is notably up to 2x faster than samtools on large RAM machines (120GB+).

Mar 31, 2021

DEG analysis without biological Replication

 DEG analysis without biological Replication

https://www.researchgate.net/post/DEG_analysis_without_biological_Replication

 

->
Without replicates, you cannot estimate which genes are differentially expressed using EdgeR or DESeq2. You can only calculate fold changes based on normalized read counts (preferentially CPM normalized by TMM method included in any EdgeR analysis) and apply a stringent fold-change cut-off to determine which genes are more or less expressed depending on the condition.

->
it might be able to assume certain samples as biological replicates. I recommend you check your samples' clustering using a PCA plot (explained in the DESeq2 manual/workflow), this  is a good way of exploring your data. For example, if you have 4 control samples and 4 treatment samples it might be that all your control samples make one o group and they together differentiate from your treatment samples. If this is the case, one could argue that for the purpose of analysing DEG between treatment and control one could consider all the control samples as replicates, same would apply for the treatment samples. However, the relevance of this approach will depend on the nature of the experimental setup.

Mar 10, 2021

Docker for bioinformatics - biocontainers

https://hub.docker.com/u/biocontainers/

 

Docker overview 

 
Docker is an open platform for developing, shipping, and running applications. Docker enables you to separate your applications from your infrastructure so you can deliver software quickly. With Docker, you can manage your infrastructure in the same ways you manage your applications. By taking advantage of Docker’s methodologies for shipping, testing, and deploying code quickly, you can significantly reduce the delay between writing code and running it in production.
Docker Architecture Diagram

Ct, Cq, Rn value in qRT-PCR

What Is a Cq (Ct) Value?

 https://bitesizebio.com/24581/what-is-a-ct-value/

Ct – threshold cycle
Cp – crossing point
TOP – take-off point
Cq – quantification cycle 


Image of a qPCR graph showing how the Cq value is obtained

Good summary for qRT-PCR

https://www.thermofisher.com/us/en/home/life-science/pcr/real-time-pcr/real-time-pcr-learning-center.html 

 

https://www.thermofisher.com/content/dam/LifeTech/Documents/PDFs/PG1503-PJ9169-CO019879-Re-brand-Real-Time-PCR-Understanding-Ct-Value-Americas-FHR.pdf

 

Rn value

https://www.qiagen.com/us/resources/faq?id=ee18399a-b88b-43ef-9929-27d79ef9ed09&lang=en

 The Rn value, or normalized reporter value, is the fluorescent signal from SYBR Green normalized to (divided by) the signal of the passive reference dye for a given reaction. The delta Rn value is the Rn value of an experimental reaction minus the Rn value of the baseline signal generated by the instrument. This parameter reliably calculates the magnitude of the specific signal generated from a given set of PCR conditions. For more information, please refer to your cycler's user manual.

 

RSEM : RNA-Seq transcript quantification

RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome

 BMC Bioinformatics volume 12, Article number: 323 (2011

 https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-12-323

RNA-Seq gene expression estimation with read mapping uncertainty

Bioinformatics, Volume 26, Issue 4, 15 February 2010, Pages 493–500

 https://doi.org/10.1093/bioinformatics/btp692

 

RSEM tutorial

https://github.com/bli25broad/RSEM_tutorial

Oct 26, 2020

ChIP-seq sequencing depth guidline

Practical Guidelines for the Comprehensive Analysis of ChIP-seq Data
. 2013 Nov; 9(11): e1003326. 
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003326 


Impact of sequencing depth in ChIP-seq experiments
. 2014 May 1; 42(9): e74.
https://academic.oup.com/nar/article/42/9/e74/1248114


What is the minimum million reads that I need to have a good result of chip-seq in human?
https://www.researchgate.net/post/What_is_the_minimum_million_reads_that_I_need_to_have_a_good_result_of_chip-seq_in_human


Recommended Coverage and Read Depth for NGS Applications
https://genohub.com/recommended-sequencing-coverage-by-application/


Table 1: Coverage and Read Recommendations by Application


Category Detection or Application Recommended Coverage (x) or Reads (millions) References
Whole genome sequencing Homozygous SNVs 15x Bentley et al., 2008

Heterozygous SNVs 33x Bentley et al., 2008

INDELs 60x Feng et al., 2014

Genotype calls 35x Ajay et al., 2011

CNV 1-8x Xie et al., 2009; Medvedev at al., 2010
Whole exome sequencing Homozygous SNVs 100x (3x local depth) Clark et al., 2011; Meynert et al., 2013

Heterozygous SNVs 100x (13x local depth) Clark et al., 2011; Meynert et al., 2013

INDELs not recommended Feng et al., 2014
Transcriptome Sequencing Differential expression profiling 10-25M Liu Y. et al., 2014; ENCODE 2011 RNA-Seq

Alternative splicing 50-100M Liu Y. et al., 2013; ENCODE 2011 RNA-Seq

Allele specific expression 50-100M Liu Y. et al., 2013; ENCODE 2011 RNA-Seq

De novo assembly >100M Liu Y. et al., 2013; ENCODE 2011 RNA-Seq
DNA Target-Based Sequencing ChIP-Seq 10-14M (sharp peaks); 20-40M (broad marks) Rozowsky et al., 2009; ENCODE 2011 Genome; Landt et al., 2012

Hi-C 100M Belton, J.M et al., 2012

4C (Circularized Chromosome Confirmation Capture) 1-5M van de Weken, H.J.G. et al., 2012

5C (Chromosome Carbon Capture Carbon Copy) 15-25M Sanyal A. et al., 2012

ChIA-PET (Chromatin Interaction Analysis by Paired-End Tag Sequencing) 15-20M Zhang, J. et al., 2012

FAIRE-Seq 25-55M ENCODE 2011 Genome; Landt et al., 2012

DNAse 1-Seq 25-55M Landt et al., 2012
DNA Methylation Sequencing CAP-Seq >20M Long, H.K. et al., 2013

MeDIP-Seq 60M Taiwo, O. et al., 2012

RRBS (Reduced Representation Bisulfite Sequencing) 10X ENCODE 2011 Genome

Bisulfite-Seq 5-15X; 30X Ziller, M.J et al., 2015; Epigenomics Road Map
RNA-Target-Based Sequencing CLIP-Seq 10-40M Cho J. et al., 2012; Eom T. et al., 2013; Sugimoto Y. et al., 2012

iCLIP 5-15M Sugimoto Y. et al., 2012; Rogelj B. et al., 2012

PAR-CLIP 5-15M Rogelj B. et al., 2012

RIP-Seq 5-20M Lu Z. et al., 2014
Small RNA (microRNA) Sequencing Differential Expression ~1-2M Metpally RPR et al., 2013; Campbell et al., 2015

Discovery ~5-8MMetpally RPR et al., 2013; Campbell et al., 2015

References:

  • Ajay, S.S et al. Accurate and comprehensive sequencing of personal genomes. Genome Research 21, 1498 (2011).
  • Belton, J.M. et al., Hi-C: a comprehensive technique to capture the conformation of genomes. Methods, 58, 221-230 (2012).
  • Bentley, D. R. et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456, 53–59 (2008).
  • Campbell J.D. et al., Assessment of microRNA differential expression and detection in multiplexed small RNA sequencing data. RNA 21, 164-171 (2015).
  • Cho J. et al., LIN28A Is a Suppressor of ER-Associated Translation in Embryonic Stem Cells. Cell 151, 765-777 (2012).
  • Clark, M. J. et al. Performance comparison of exome DNA sequencing technologies. Nature Biotech. 29, 908–914 (2011).
  • ENCODE 2011 Genome Guidelines
  • ENCODE 2011 RNA-Seq Guidelines
  • Eom T. et al., NOVA-dependent regulation of cryptic NMD exons controls synaptic protein levels after seizure. Elife 2, e00178 (2013).
  • Epigenomics Road Map Guidelines
  • Feng, H. et al. Reducing INDEL calling errors in whole genome and exome sequencing data. Genome Medicine 6, 89 (2014).
  • Landt, S.G. et al., ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia. Genome Research, 22, 1813-1831 (2012).
  • Liu Y., et al., RNA-seq differential expression studies: more sequence or more replication? Bioinformatics 30(3):301-304 (2014).
  • Liu Y., et al., Evaluating the impact of sequencing depth on transcriptome profiling in human adipose. Plos One 8(6):e66883 (2013).
  • Long, H.K. et al., Epigenetic conservation at gene regulatory elements revealed by non-methylated DNA profiling in seven vertebrates. eLIFE 2, e00348 (2013).
  • Lu Z. et al., RIP-seq analysis of eukaryotic Sm proteins identifies three major categories of Sm-containing ribonucleoproteins. Genome Biology 15:R7 (2014).
  • Maynert et al., Quantifying single nucleotide variant detection sensitivity in exome sequencing. BMC Bioinformatics 14, 195 (2013).
  • Medvedev, P. Detecting copy number variation with mated short reads. Genome Research 20, 1613 (2010).
  • Metpally RPR et al., Comparison of Analysis Tools for miRNA High Throughput Sequencing Using Nerve Crush as a Model. Frontiers in Genetics 4:20 (2013).
  • Rogelj et al., Widespread binding of FUS along nascent RNA regulates alternative splicing in the brain. Scientifc Reports 2, 603 (2012).
  • Rozowsky, J.et al., PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls. Nature Biotech. 27, 65-75 (2009).
  • Sanyal, A. et al., The long-range interaction landscape of gene promoters. Nature, 489, 109-113 (2012).
  • Sugimoto Y et al., Analysis of CLIP and iCLIP methods for nucleotide-resolution studies of protein-RNA interactions. Genome Biology 13:R67 (2012).
  • Taiwo, O. et al., Methylome analysis using MeDIP-seq with low DNA concentrations. Nature Protocols 7 617-636 (2012).
  • van de Weken, H.J.G. et al., Robust 4C-seq data analysis to screen for regulatory DNA interactions. Nature Methods 9, 969-972 (2012).
  • Xie, C. & Tammi, M. T. CNV–seq, a new method to detect copy number variation using high-throughput sequencing. BMC Bioinformatics 10, 80 (2009).
  • Zhang, J. et al., ChIA-PET analysis of transcriptional chromatin interactions. Methods 58 289-299 (2012).
  • Ziller, M.J et al., Coverage recommendations for methylation analysis by whole-genome bisulfite sequencing. Nature Methods 12, 230-232 (2015).

MultiQC : Aggregate bioinformatics results across many samples into a single report

 Github

https://github.com/ewels/MultiQC

 

 Example

https://multiqc.info/examples/rna-seq/multiqc_report.html