page

Apr 8, 2019

samtools rmdup vs Picard MarkDuplicates

samtools rmdup PE
http://seqanswers.com/forums/showthread.php?t=5959

If you have one pair of reads where read 1 starts at position 100, and the other end starts at position 200, and a second pair of reads where read 1 starts at position 100, and read 2 starts at position 250, those came from different fragments of DNA. You can tell because the read 2 start is different, even though the read 1 start is the same.

When treating the reads as paired end, none of those reads should be deleted as PCR duplicates.

However, if you ran rmdup -S, the software will not check to see if read 2 has a different start coordinate, so one of those read 1 reads will be treated as a duplicate, and deleted.


Question: Samtools Dedup Documentation
 https://www.biostars.org/p/55111/

rmdup for PE reads is pretty straightforward. It looks for identical external coordinates, meaning it only looks at the 5' start coordinates of the FR orientation pair-reads. Then it takes the pair with the highest mapping quality.
For SE reads, I've read that samtools also only looks for identical 5' start coordinates, not both start and end coordinates. I think the idea is that sequencers usually fall in quality towards the 3'. After mapping, duplicate reads have higher chance of mapping differentially towards the 3' end. So it only looks at the adapter trimmed 5' start for duplicates.


Question: Picard MarkDuplicates and SamTools rmdup algorithm documentation
https://www.biostars.org/p/105291/
SamTools rmdup 'only' compare two reads on chrom and pos (which could be wrong if two reads come from two different libraries) and **removes** reads from the BAM: information is lost.

picard set the sam flag 1024 but do not delete the reads. two pairs of reads are compared , as far as I know, using the chrom, the pos, the group-id (sample...) + (flowcell , lane, X,Y for optical dups) (,and the cigar string ?).



Picard MarkDuplicates vs samtools rmdup for variant calling with GATK
https://gatkforums.broadinstitute.org/gatk/discussion/6793/picard-markduplicates-vs-samtools-rmdup-for-variant-calling-with-gatk 
 - this post explains Picard's duplicate marking tools





No comments:

Post a Comment