vendredi 1 juillet 2016

Replace some accented letters from word in python


I'm trying to replace some accented letters from Portuguese words in Python using re.

accentedLetters = ['à', 'á', 'â', 'ã', 'é', 'ê', 'í', 'ó', 'ô', 'õ', 'ú', 'ü']
letters         = ['a', 'a', 'a', 'a', 'e', 'e', 'i', 'o', 'o', 'o', 'u', 'u']

So the accentedLetters will be replaced by the letter in the letters array.

In this way, my expectated result are for example:

ação    => açao
frações => fraçoes 

For doing this, I created the following function:

def removeAccents(word):
    accentedLetters = ['à', 'á', 'â', 'ã', 'é', 'ê', 'í', 'ó', 'ô', 'õ', 'ú', 'ü']
    letters =         ['a', 'a', 'a', 'a', 'e', 'e', 'i', 'o', 'o', 'o', 'u', 'u']

    for i, al in enumerate(accentedLetters):
        rule = re.compile(str(al), re.UNICODE)
        word = rule.sub(letters[i], word)
        print word

    return word

What I'm doing wrong? Is there a better way of doing this?


Aucun commentaire:

Enregistrer un commentaire