page

Jul 22, 2021

How to compute TPMs with TMM factors (edgeR)?

 https://www.biostars.org/p/388584/

TMM is a method for normalizing the library sizes rather than a method for normalizing read counts. As the edgeR User's Guide says (page 15):

normalization in edgeR is model-based, and the original read counts are not themselves transformed.

Which way around is your question? Do you have TPMs and want to compute TMM factors or do you have TMM factors and want to compute TPMs?

If you are asking the first question, then no, TMM factors can only be computed from the raw counts, not from quantities such as TPMs or CPMs from which the library sizes have already been divided out. If you already have TPMs from some software package, then normalization has almost certainly already been applied, so I would be very wary about trying to re-normalize them unless you really know what you're doing.

If you are asking the second question then, yes, TMM factors can in principle be used to compute TPMs. In edgeR, any downstream quantity that is computed from the library sizes will incorporate the TMM factors automatically, because the factors are considered part of the effective library sizes. TMM normalization factors will be applied automatically when you use

CPM <- cpm(dge)

or

RPKM <- rpkm(dge)

in edgeR to compute CPMs or RPKMs from a DGEList object. I don't necessarily recommend TPM values myself, but if you go on to compute TPMs by

TPM <- t( t(RPKM) / colSums(RPKM) ) * 1e6

then the TMM factors will naturally have been incorporated into the computation.

No comments:

Post a Comment