libere-tes-chaine-de-mots/import_data/utils/convert_encoding_meta.py

8 lines
200 B
Python
Raw Normal View History

import re
def convert_encoding_meta(text):
conv_text = re.sub(r'[\xc2-\xf4][\x80-\xbf]+',
lambda m: m.group(0).encode('latin1').decode('utf8'), text)
return conv_text