https://stackoverflow.com/questions/21129020/how-to-fix-unicodedecodeerror-ascii-codec-cant-decode-byte
https://stackoverflow.com/questions/24358361/removing-u2018-and-u2019-character
>>> for row in query.rows():
... output.write(str(row["primaryIdentifier"])+'\t'+str(row["symbol"])+'\t'+str(row["briefDescription"])+'\t'+str(row["isObsolete"])+'\t'+str(row["description"])+'\t'+str(row["curatorSummary"])+'\n')
...
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 157: ordinal not in range(128)
Solution
1) Change the default encoding of the whole script to be 'UTF-8',
# encoding=utf8
import sys
reload(sys)
sys.setdefaultencoding('utf8')
2) replace them with their ASCII equivalent
>>> print u"\u2018Hi\u2019"
‘Hi’
>>> print u"\u2018Hi\u2019".replace(u"\u2018", "'").replace(u"\u2019", "'")
'Hi'
import re
s = u"\u2018Hi\u2019"
>>> print re.sub(u"(\u2018|\u2019)", "'", s)
'Hi'
No comments:
Post a Comment