NGS scrap: 2020

Practical Guidelines for the Comprehensive Analysis of ChIP-seq Data
. 2013 Nov; 9(11): e1003326.
https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003326

Impact of sequencing depth in ChIP-seq experiments
. 2014 May 1; 42(9): e74.
https://academic.oup.com/nar/article/42/9/e74/1248114

What is the minimum million reads that I need to have a good result of chip-seq in human?
https://www.researchgate.net/post/What_is_the_minimum_million_reads_that_I_need_to_have_a_good_result_of_chip-seq_in_human

Recommended Coverage and Read Depth for NGS Applications
https://genohub.com/recommended-sequencing-coverage-by-application/

Table 1: Coverage and Read Recommendations by Application

Category	Detection or Application	Recommended Coverage (x) or Reads (millions)	References
Whole genome sequencing	Homozygous SNVs	15x	Bentley et al., 2008
	Heterozygous SNVs	33x	Bentley et al., 2008
	INDELs	60x	Feng et al., 2014
	Genotype calls	35x	Ajay et al., 2011
	CNV	1-8x	Xie et al., 2009; Medvedev at al., 2010
Whole exome sequencing	Homozygous SNVs	100x (3x local depth)	Clark et al., 2011; Meynert et al., 2013
	Heterozygous SNVs	100x (13x local depth)	Clark et al., 2011; Meynert et al., 2013
	INDELs	not recommended	Feng et al., 2014
Transcriptome Sequencing	Differential expression profiling	10-25M	Liu Y. et al., 2014; ENCODE 2011 RNA-Seq
	Alternative splicing	50-100M	Liu Y. et al., 2013; ENCODE 2011 RNA-Seq
	Allele specific expression	50-100M	Liu Y. et al., 2013; ENCODE 2011 RNA-Seq
	De novo assembly	>100M	Liu Y. et al., 2013; ENCODE 2011 RNA-Seq
DNA Target-Based Sequencing	ChIP-Seq	10-14M (sharp peaks); 20-40M (broad marks)	Rozowsky et al., 2009; ENCODE 2011 Genome; Landt et al., 2012
	Hi-C	100M	Belton, J.M et al., 2012
	4C (Circularized Chromosome Confirmation Capture)	1-5M	van de Weken, H.J.G. et al., 2012
	5C (Chromosome Carbon Capture Carbon Copy)	15-25M	Sanyal A. et al., 2012
	ChIA-PET (Chromatin Interaction Analysis by Paired-End Tag Sequencing)	15-20M	Zhang, J. et al., 2012
	FAIRE-Seq	25-55M	ENCODE 2011 Genome; Landt et al., 2012
	DNAse 1-Seq	25-55M	Landt et al., 2012
DNA Methylation Sequencing	CAP-Seq	>20M	Long, H.K. et al., 2013
	MeDIP-Seq	60M	Taiwo, O. et al., 2012
	RRBS (Reduced Representation Bisulfite Sequencing)	10X	ENCODE 2011 Genome
	Bisulfite-Seq	5-15X; 30X	Ziller, M.J et al., 2015; Epigenomics Road Map
RNA-Target-Based Sequencing	CLIP-Seq	10-40M	Cho J. et al., 2012; Eom T. et al., 2013; Sugimoto Y. et al., 2012
	iCLIP	5-15M	Sugimoto Y. et al., 2012; Rogelj B. et al., 2012
	PAR-CLIP	5-15M	Rogelj B. et al., 2012
	RIP-Seq	5-20M	Lu Z. et al., 2014
Small RNA (microRNA) Sequencing	Differential Expression	~1-2M	Metpally RPR et al., 2013; Campbell et al., 2015
	Discovery	~5-8M	Metpally RPR et al., 2013; Campbell et al., 2015

References:

Ajay, S.S et al. Accurate and comprehensive sequencing of personal genomes. Genome Research 21, 1498 (2011).
Belton, J.M. et al., Hi-C: a comprehensive technique to capture the conformation of genomes. Methods, 58, 221-230 (2012).
Bentley, D. R. et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456, 53–59 (2008).
Campbell J.D. et al., Assessment of microRNA differential expression and detection in multiplexed small RNA sequencing data. RNA 21, 164-171 (2015).
Cho J. et al., LIN28A Is a Suppressor of ER-Associated Translation in Embryonic Stem Cells. Cell 151, 765-777 (2012).
Clark, M. J. et al. Performance comparison of exome DNA sequencing technologies. Nature Biotech. 29, 908–914 (2011).
ENCODE 2011 Genome Guidelines
ENCODE 2011 RNA-Seq Guidelines
Eom T. et al., NOVA-dependent regulation of cryptic NMD exons controls synaptic protein levels after seizure. Elife 2, e00178 (2013).
Epigenomics Road Map Guidelines
Feng, H. et al. Reducing INDEL calling errors in whole genome and exome sequencing data. Genome Medicine 6, 89 (2014).
Landt, S.G. et al., ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia. Genome Research, 22, 1813-1831 (2012).
Liu Y., et al., RNA-seq differential expression studies: more sequence or more replication? Bioinformatics 30(3):301-304 (2014).
Liu Y., et al., Evaluating the impact of sequencing depth on transcriptome profiling in human adipose. Plos One 8(6):e66883 (2013).
Long, H.K. et al., Epigenetic conservation at gene regulatory elements revealed by non-methylated DNA profiling in seven vertebrates. eLIFE 2, e00348 (2013).
Lu Z. et al., RIP-seq analysis of eukaryotic Sm proteins identifies three major categories of Sm-containing ribonucleoproteins. Genome Biology 15:R7 (2014).
Maynert et al., Quantifying single nucleotide variant detection sensitivity in exome sequencing. BMC Bioinformatics 14, 195 (2013).
Medvedev, P. Detecting copy number variation with mated short reads. Genome Research 20, 1613 (2010).
Metpally RPR et al., Comparison of Analysis Tools for miRNA High Throughput Sequencing Using Nerve Crush as a Model. Frontiers in Genetics 4:20 (2013).
Rogelj et al., Widespread binding of FUS along nascent RNA regulates alternative splicing in the brain. Scientifc Reports 2, 603 (2012).
Rozowsky, J.et al., PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls. Nature Biotech. 27, 65-75 (2009).
Sanyal, A. et al., The long-range interaction landscape of gene promoters. Nature, 489, 109-113 (2012).
Sugimoto Y et al., Analysis of CLIP and iCLIP methods for nucleotide-resolution studies of protein-RNA interactions. Genome Biology 13:R67 (2012).
Taiwo, O. et al., Methylome analysis using MeDIP-seq with low DNA concentrations. Nature Protocols 7 617-636 (2012).
van de Weken, H.J.G. et al., Robust 4C-seq data analysis to screen for regulatory DNA interactions. Nature Methods 9, 969-972 (2012).
Xie, C. & Tammi, M. T. CNV–seq, a new method to detect copy number variation using high-throughput sequencing. BMC Bioinformatics 10, 80 (2009).
Zhang, J. et al., ChIA-PET analysis of transcriptional chromatin interactions. Methods 58 289-299 (2012).
Ziller, M.J et al., Coverage recommendations for methylation analysis by whole-genome bisulfite sequencing. Nature Methods 12, 230-232 (2015).

https://github.com/samtools/samtools/issues/359

rmdup with no option (neither -s nor -S) expects alignment pairs with alternating +/- TLEN values. It is a special-purpose tool for removing PCR duplicates which conform to that pattern.

The warning is issued for consecutive positive TLEN values. A simple contrived example:

cat rmdupExample.sam
@SQ SN:c2   LN:9
s1  67  c2  1   0   9M  =   1   30  CTAATAATC   XXXXXXXXX   RG:Z:1st
s1  67  c2  1   0   9M  =   1   -30 CTAATAATC   XXXXXXXXX   RG:Z:1st
s1  259 c2  1   0   9M  =   1   30  CTAATAATC   YXXXXXXXX   RG:Z:2nd
s1  131 c2  1   0   9M  =   1   30  CTAATAATC   XXXXXXXXX   RG:Z:3rd
s1  131 c2  1   0   9M  =   1   -30 CTAATAATC   XXXXXXXXX   RG:Z:3rd

./samtools view -b -o /tmp/rmdupExample.bam /tmp/rmdupExample.sam
../samtools-0.1.19/samtools rmdup /tmp/rmdupExample.bam /tmp/test_input_1_rmdup.bam
[bam_rmdup_core] processing reference c2...
[bam_rmdup_core] inconsistent BAM file for pair 's1'. Continue anyway.
[bam_rmdup_core] 2 / 3 = 0.6667 in library '    '
./samtools view /tmp/test_input_1_rmdup.bam
s1  67  c2  1   0   9M  =   1   -30 CTAATAATC   XXXXXXXXX   RG:Z:1st
s1  259 c2  1   0   9M  =   1   30  CTAATAATC   YXXXXXXXX   RG:Z:2nd

NGS scrap

page

Oct 26, 2020

ChIP-seq sequencing depth guidline

Table 1: Coverage and Read Recommendations by Application

References:

MultiQC : Aggregate bioinformatics results across many samples into a single report

Sep 24, 2020

[bam_rmdup_core] inconsistent BAM file for pair

Jul 21, 2020

SIMPLE : Pipeline for Mapping Point Mutations