Apr 18, 2018
voom: precision weights unlock linear model analysis tools for RNA-seq read counts
https://genomebiology.biomedcentral.com/articles/10.1186/gb-2014-15-2-r29
linear modeling for RNA-seq count data
Abstract
New normal linear modeling strategies are presented for analyzing read counts from RNA-seq experiments. The voom method estimates the mean-variance relationship of the log-counts, generates a precision weight for each observation and enters these into the limma empirical Bayes analysis pipeline. This opens access for RNA-seq analysts to a large body of methodology developed for microarrays. Simulation studies show that voom performs as well or better than count-based RNA-seq methods even when the data are generated according to the assumptions of the earlier methods. Two case studies illustrate the use of linear modeling and gene set testing methods
linear modeling for RNA-seq count data
Abstract
New normal linear modeling strategies are presented for analyzing read counts from RNA-seq experiments. The voom method estimates the mean-variance relationship of the log-counts, generates a precision weight for each observation and enters these into the limma empirical Bayes analysis pipeline. This opens access for RNA-seq analysts to a large body of methodology developed for microarrays. Simulation studies show that voom performs as well or better than count-based RNA-seq methods even when the data are generated according to the assumptions of the earlier methods. Two case studies illustrate the use of linear modeling and gene set testing methods
How do you read from stdin in Python?
How do you read from stdin in Python?
https://stackoverflow.com/questions/1450393/how-do-you-read-from-stdin-in-python
https://stackoverflow.com/questions/1450393/how-do-you-read-from-stdin-in-python
Here's from Learning Python:
import sys
data = sys.stdin.readlines()
print "Counted", len(data), "lines."
On Unix, you could test it by doing something like:
% cat countlines.py | python countlines.py
Counted 3 lines.
On Windows or DOS, you'd do:
C:\> type countlines.py | python countlines.py
Counted 3 lines.
BED file handling software : bedtools, BEDOPS
bedtools: a powerful toolset for genome arithmetic
http://bedtools.readthedocs.io/en/latest/#
bedtools allows one to intersect, merge, count, complement, and shuffle genomic intervals from multiple files in widely-used genomic file formats such as BAM, BED, GFF/GTF, VCF. While each individual tool is designed to do a relatively simple task (e.g., intersect two interval files), quite sophisticated analyses can be conducted by combining multiple bedtools operations on the UNIX command line.
BEDOPS: the fast, highly scalable and easily-parallelizable genome analysis toolkit¶
https://bedops.readthedocs.io/en/latest/index.html
Data conversion
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 157: ordinal not in range(128)
http://markhneedham.com/blog/2015/05/21/python-unicodeencodeerror-ascii-codec-cant-encode-character-uxfc-in-position-11-ordinal-not-in-range128/
https://stackoverflow.com/questions/21129020/how-to-fix-unicodedecodeerror-ascii-codec-cant-decode-byte
https://stackoverflow.com/questions/24358361/removing-u2018-and-u2019-character
>>> for row in query.rows():
... output.write(str(row["primaryIdentifier"])+'\t'+str(row["symbol"])+'\t'+str(row["briefDescription"])+'\t'+str(row["isObsolete"])+'\t'+str(row["description"])+'\t'+str(row["curatorSummary"])+'\n')
...
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 157: ordinal not in range(128)
Solution
1) Change the default encoding of the whole script to be 'UTF-8',
# encoding=utf8
import sys
reload(sys)
sys.setdefaultencoding('utf8')
2) replace them with their ASCII equivalent
Alternatively with regex:
https://stackoverflow.com/questions/21129020/how-to-fix-unicodedecodeerror-ascii-codec-cant-decode-byte
https://stackoverflow.com/questions/24358361/removing-u2018-and-u2019-character
>>> for row in query.rows():
... output.write(str(row["primaryIdentifier"])+'\t'+str(row["symbol"])+'\t'+str(row["briefDescription"])+'\t'+str(row["isObsolete"])+'\t'+str(row["description"])+'\t'+str(row["curatorSummary"])+'\n')
...
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 157: ordinal not in range(128)
Solution
1) Change the default encoding of the whole script to be 'UTF-8',
# encoding=utf8
import sys
reload(sys)
sys.setdefaultencoding('utf8')
2) replace them with their ASCII equivalent
>>> print u"\u2018Hi\u2019"
‘Hi’
>>> print u"\u2018Hi\u2019".replace(u"\u2018", "'").replace(u"\u2019", "'")
'Hi'
import re
s = u"\u2018Hi\u2019"
>>> print re.sub(u"(\u2018|\u2019)", "'", s)
'Hi'
Subscribe to:
Posts (Atom)