samedi 18 juin 2016

Python Special Characters Encoding


I have a python script that reads a CSV file and writes in a XML file. I have been hitting a wall trying to find out how to read special characters such as: ç, á, é, í, etc. The script runs perfectly fine without special characters. That is the script header:

# coding=utf-8

'''
@modified by: Julierme Pinheiro
'''
import os
import sys
import unittest
from unittest import skip
import csv
import uuid
import xml
import xml.dom.minidom as minidom
import owslib
from owslib.iso import *
import pyproj
from decimal import *
import logging

The way I retrieve information from the csv file is shown bellow:

# add the title
                title = data[1]
                titleElement = identificationInfo[0].getElementsByTagName('gmd:title')[0]
                titleNode = record.createTextNode(title)
                titleElement.childNodes[1].appendChild(titleNode)
                print "Title:" + title

Note: If data[1], second column in the csv file, contains a special character as found in "Navegação" the script fails (It does not write anything in the xml file).

The way a new XML file is created based on a blank Template XML is shown bellow:

 # write out the gemini record
                filename = '../output/%s.xml' % fileId
                with open(filename,'w') as test_xml:
                    test_xml.write(record.toprettyxml(newl="", encoding="utf-8"))
            except:
                e = sys.exc_info()[1]
                logging.debug("Import failed for entry %s" % data[0])
                logging.debug("Specific error: %s" % e)

    @skip('')
    def testOWSMetadataImport(self):
        raw_data = []
        with open('../input/metadata_cartapapel.csv') as csvfile:
            reader = csv.reader(csvfile, dialect='excel')
            for columns in reader:
                raw_data.append(columns)   

        md = MD_Metadata(etree.parse('gemini-template.xml'))
        md.identification.topiccategory = ['farming','environment']
        print md.identification.topiccategory
        outfile = open('mdtest.xml','w')
        # crap, can't update the model and write back out - this is badly needed!!
        outfile.write(md.xml) 


if __name__ == "__main__":
    unittest.main()

Could someone help to solve this issue, please?

Thank you in advance for your time.


Aucun commentaire:

Enregistrer un commentaire