I want to remove elements of a certain tag value and then write out the .xml file WITHOUT any tags for those deleted elements; is my only option to create a new tree?
There are two options to remove/delete an element:
clear()
Resets an element. This function removes all subelements, clears all
attributes, and sets the text and tail attributes to None.
At first I used this and it works for the purpose of removing the data from the element but I'm still left with an empty element:
# Remove all elements from the tree that are NOT "job" or "make" or "build" elements
log = open("debug.log", "w")
for el in root.iter(*):
if el.tag != "job" and el.tag != "make" and el.tag != "build":
print("removed = ", el.tag, el.attrib, file=log)
el.clear()
else:
print("NOT", el.tag, el.attrib, file=log)
log.close()
tree.write("make_and_job_tree.xml", short_empty_elements=False)
The problem is that xml.etree.ElementTree.ElementTree.write() still writes out empty tags no matter what:
...The keyword-only short_empty_elements parameter controls the
formatting of elements that contain no content. If True (the default),
they are emitted as a single self-closed tag, otherwise they are
emitted as a pair of start/end tags.
Why isn't there an option to just not print out those empty tags! Whatever.
So then I thought I might try
remove(subelement)
Removes subelement from the element. Unlike the find* methods this
method compares elements based on the instance identity, not on tag
value or contents.
But this only operates on the child elements.
So I'd have to do something like:
for el in root.iter(*):
for subel in el:
if subel.tag != "make" and subel.tag != "job" and subel.tag != "build":
el.remove(subel)
But there's a big problem here: I'm invalidating the iterator by removing elements, right?
Is it enough to simply check if the element is empty by adding if subel?:
if subel and subel.tag != "make" and subel.tag != "job" and subel.tag != "build"
Or do I have to get a new iterator to the tree elements every time I invalidate it?
Remember: I just wanted to write out the xml file with no tags for the empty elements.
Here's an example.
<?xml version="1.0"?>
<data>
<country name="Liechtenstein">
<rank>1</rank>
<year>2008</year>
<gdppc>141100</gdppc>
<neighbor name="Austria" direction="E"/>
<neighbor name="Switzerland" direction="W"/>
</country>
<country name="Singapore">
<rank>4</rank>
<year>2011</year>
<gdppc>59900</gdppc>
<neighbor name="Malaysia" direction="N"/>
</country>
<country name="Panama">
<rank>68</rank>
<year>2011</year>
<gdppc>13600</gdppc>
<neighbor name="Costa Rica" direction="W"/>
<neighbor name="Colombia" direction="E"/>
</country>
</data>
Let's say I want to remove any mention of neighbor.
Ideally, I'd want this output after the removal:
<?xml version="1.0"?>
<data>
<country name="Liechtenstein">
<rank>1</rank>
<year>2008</year>
<gdppc>141100</gdppc>
</country>
<country name="Singapore">
<rank>4</rank>
<year>2011</year>
<gdppc>59900</gdppc>
</country>
<country name="Panama">
<rank>68</rank>
<year>2011</year>
<gdppc>13600</gdppc>
</country>
</data>
Problem, is when I run the code using clear() (see first code block up above) and write it to a file, I get this:
<data>
<country name="Liechtenstein">
<rank>1</rank>
<year>2008</year>
<gdppc>141100</gdppc>
<neighbor></neighbor><neighbor></neighbor></country>
<country name="Singapore">
<rank>4</rank>
<year>2011</year>
<gdppc>59900</gdppc>
<neighbor></neighbor></country>
<country name="Panama">
<rank>68</rank>
<year>2011</year>
<gdppc>13600</gdppc>
<neighbor></neighbor><neighbor></neighbor></country>
</data>
Notice neighbor still appears.
I know I could easily run a regex over the output but there's gotta be a way (or another Python api) that does this on the fly instead of requiring me to touch my .xml file again.
Aucun commentaire:
Enregistrer un commentaire