I am building a web crawler using scrapy that just takes all reddit links of the front page. When I try to put it into a json folder all I get is '['.
Here is my spider.
from scrapy import Spider
from scrapy.selector import Selector
from redditScrape.items import RedditscrapeItem
class RedditSpider(Spider):
    name = "redditScrape"
    allowed_domains = ["reddit.com"]
    start_urls = [
        "https://www.reddit.com/r/all"
    ]
    def parse(self, response):
        titles = Selector(response).xpath('//div[@class="entry unvoted lcTagged"]/p[@class="title"]')
        for title in titles:
            item = RedditscrapeItem()
            item['title'] = title.xpath('/a[@class="title may-blank loggedin  srTagged imgScanned"]/text()').extract()
            yield item
Whenever I run the xpath query in my google chrome console I get the result im looking for.
Any idea why my scraper wont output correctly?
This is the command I am using to execute:
scrapy crawl redditScrape -o items.json -t json
 
Aucun commentaire:
Enregistrer un commentaire