vendredi 24 juin 2016

Scrapy: Requesting a different html inside parse()


Is it possible to request a new HTML while inside parse?

My code currently is reading HTML links in a CSV file and then putting all links on it in the start_urls list.

What I want to happen is when it gets a link on start_urls, parses it, loops on all pages until it satisfies a condition inside the loop. Breaks the entire loop and continue parsing the next item on the start_urls list

with open('.scrappy_demo.csv', 'rb') as csvfile:
    #Open CSV File here
    linkreader = csv.reader(csvfile, dialect=csv.excel)
    for row in linkreader:
        start_url.append(str(row)[2:-2]+"/search?page=1")
        i += 1

class demo(scrapy.Spider):
    ...
    def parse1(self, response):
        return response

    def parse(self, response):
        i = 0;
        j = 0;
        ENDLOOP = False
        ...
        while(next_page <> current_page and not ENDLOOP):
                entry_list = response.css('.entry__row-inner-wrap').extract()

                while (i < len(entry_list) and not ENDLOOP):
                        [Doing some css,xpath filtering here]
                        if([Some Condition here]):
                                [Doing some file write here]
                                ENDLOOP = True
                        i += 1

                j += 1
                nextPage = url_redir[:-1]+str(j+1)
                body = Request(nextPage, callback=self.parse1)
                response2 = HtmlResponse(nextPage, body)

On the last 2 lines, I'm trying to request a new html but with a +1 increment on the page number. But when the code runs, it doesn't return the html code of the request. What am I missing here?

Note: I tried checking the values of body and response2 by printing it but it looks like body.body is empty and the callback doesn't get executed

Note2: First time using scrappy

Note3: I know that code fails on 2 digit page number, but nvm that for now


Aucun commentaire:

Enregistrer un commentaire