samedi 18 juin 2016

Python BeautifulSoup html5lib mix seems to be deleting every other item in for loop


I'm new to python but am really enjoying the language so far.

I've been creating a bunch of complicated html5 elements and using the html5lib module.

When I go through elements in paragraph I can print them out fine but when I try and use bs4's insert method I get only get every other element output and I don't know why!

My python:

i = 0
    for gallery_elem in gallery_header_next_sibling:
        if ( gallery_elem.name.lower() == 'img' ):
            if ( i == 0 ):
                new_gallery = soup.new_tag( "div" )
                new_gallery[ "class" ] = "gallery"

            new_gallery_elem = soup.new_tag( "figure" )

            if ( gallery_elem.has_attr( "alt" ) ):
                new_gallery_cap = soup.new_tag( "figcaption" )
                new_gallery_cap.string = gallery_elem[ "alt" ]
                new_gallery_elem.insert( 2, new_gallery_cap )

            if ( gallery_elem.has_attr( "title" ) ):
                new_gallery_attribution = soup.new_tag( "dl" )
                new_gallery_attribution_dt = soup.new_tag( "dt" )
                new_gallery_attribution_dt.string = "Image owner:"
                new_gallery_attribution_dd = soup.new_tag( "dd" )
                new_gallery_attribution_dd.string = gallery_elem[ "title" ]
                new_gallery_attribution.insert( 0, new_gallery_attribution_dt )
                new_gallery_attribution.insert( 1, new_gallery_attribution_dd )

        new_gallery_elem.insert( 1, new_gallery_attribution )
        new_gallery_elem.insert( 1, gallery_elem )
        i = i + 1

    new_gallery_elem.insert( 1, gallery_elem )

The HTML

<img alt="Caption One." src="img/orange.jpg" title="Attribution One."/>
<img alt="Caption Two." src="img/red.jpg" title="Attribution Two."/>
<img alt="Caption Three." src="img/urban.jpg" title="Attribution Three."/>
<img alt="Caption Four." src="img/brolly.jpg" title="Attribution Four."/>
<img alt="Caption Five." src="img/tomy.jpg" title="Attribution Five."/>

The output:

<figure><figcaption>Caption One.</figcaption><img alt="Caption One." src="img/orange.jpg" title="Attribution One."/><dl><dt>Image owner:</dt><dd>Attribution One.</dd></dl></figure>
<figure><figcaption>Caption Three.</figcaption><img alt="Caption Three." src="img/urban.jpg" title="Attribution Three."/><dl><dt>Image owner:</dt><dd>Attribution Three.</dd></dl></figure>
<figure><figcaption>Caption Five.</figcaption><img alt="Caption Five." src="img/tomy.jpg" title="Attribution Five."/><dl><dt>Image owner:</dt><dd>Attribution Five.</dd></dl></figure>

If I yank out the following line I get all five elements. Does anyone have any sort of inkling as to what I'm doing wrong?

new_gallery_elem.insert( 1, gallery_elem )

Aucun commentaire:

Enregistrer un commentaire