Is it possible to request a new HTML while inside parse?
My code currently is reading HTML links in a CSV file and then putting all links on it in the start_urls
list.
What I want to happen is when it gets a link on start_urls
, parses it, loops on all pages until it satisfies a condition inside the loop. Breaks the entire loop and continue parsing the next item on the start_urls
list
with open('.scrappy_demo.csv', 'rb') as csvfile:
#Open CSV File here
linkreader = csv.reader(csvfile, dialect=csv.excel)
for row in linkreader:
start_url.append(str(row)[2:-2]+"/search?page=1")
i += 1
class demo(scrapy.Spider):
...
def parse1(self, response):
return response
def parse(self, response):
i = 0;
j = 0;
ENDLOOP = False
...
while(next_page <> current_page and not ENDLOOP):
entry_list = response.css('.entry__row-inner-wrap').extract()
while (i < len(entry_list) and not ENDLOOP):
[Doing some css,xpath filtering here]
if([Some Condition here]):
[Doing some file write here]
ENDLOOP = True
i += 1
j += 1
nextPage = url_redir[:-1]+str(j+1)
body = Request(nextPage, callback=self.parse1)
response2 = HtmlResponse(nextPage, body)
On the last 2 lines, I'm trying to request a new html but with a +1 increment on the page number. But when the code runs, it doesn't return the html code of the request. What am I missing here?
Note: I tried checking the values of body and response2 by printing it but it looks like body.body
is empty and the callback doesn't get executed
Note2: First time using scrappy
Note3: I know that code fails on 2 digit page number, but nvm that for now
Aucun commentaire:
Enregistrer un commentaire