NGS scrap: UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 157: ordinal not in range(128)

Apr 18, 2018

UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 157: ordinal not in range(128)

http://markhneedham.com/blog/2015/05/21/python-unicodeencodeerror-ascii-codec-cant-encode-character-uxfc-in-position-11-ordinal-not-in-range128/

https://stackoverflow.com/questions/21129020/how-to-fix-unicodedecodeerror-ascii-codec-cant-decode-byte

https://stackoverflow.com/questions/24358361/removing-u2018-and-u2019-character

>>> for row in query.rows():
... output.write(str(row["primaryIdentifier"])+'\t'+str(row["symbol"])+'\t'+str(row["briefDescription"])+'\t'+str(row["isObsolete"])+'\t'+str(row["description"])+'\t'+str(row["curatorSummary"])+'\n')
...
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 157: ordinal not in range(128)

Solution

1) Change the default encoding of the whole script to be 'UTF-8',

# encoding=utf8
import sys
reload(sys)
sys.setdefaultencoding('utf8')

2) replace them with their ASCII equivalent

>>> print u"\u2018Hi\u2019"
‘Hi’
>>> print u"\u2018Hi\u2019".replace(u"\u2018", "'").replace(u"\u2019", "'")
'Hi'

Alternatively with regex:

import re
s = u"\u2018Hi\u2019"
>>> print re.sub(u"(\u2018|\u2019)", "'", s)
'Hi'

No comments:

Post a Comment

Subscribe to: Post Comments (Atom)