I am building a web crawler using scrapy that just takes all reddit links of the front page. When I try to put it into a json folder all I get is '['.
Here is my spider.
from scrapy import Spider
from scrapy.selector import Selector
from redditScrape.items import RedditscrapeItem
class RedditSpider(Spider):
name = "redditScrape"
allowed_domains = ["reddit.com"]
start_urls = [
"https://www.reddit.com/r/all"
]
def parse(self, response):
titles = Selector(response).xpath('//div[@class="entry unvoted lcTagged"]/p[@class="title"]')
for title in titles:
item = RedditscrapeItem()
item['title'] = title.xpath('/a[@class="title may-blank loggedin srTagged imgScanned"]/text()').extract()
yield item
Whenever I run the xpath query in my google chrome console I get the result im looking for.
Any idea why my scraper wont output correctly?
This is the command I am using to execute:
scrapy crawl redditScrape -o items.json -t json
Aucun commentaire:
Enregistrer un commentaire