dimanche 3 juillet 2016

Scrapy only outputs '['


I am building a web crawler using scrapy that just takes all reddit links of the front page. When I try to put it into a json folder all I get is '['.

Here is my spider.

from scrapy import Spider
from scrapy.selector import Selector
from redditScrape.items import RedditscrapeItem


class RedditSpider(Spider):
    name = "redditScrape"
    allowed_domains = ["reddit.com"]
    start_urls = [
        "https://www.reddit.com/r/all"
    ]

    def parse(self, response):
        titles = Selector(response).xpath('//div[@class="entry unvoted lcTagged"]/p[@class="title"]')

        for title in titles:
            item = RedditscrapeItem()
            item['title'] = title.xpath('/a[@class="title may-blank loggedin  srTagged imgScanned"]/text()').extract()
            yield item

Whenever I run the xpath query in my google chrome console I get the result im looking for.

enter image description here

Any idea why my scraper wont output correctly?

This is the command I am using to execute:

scrapy crawl redditScrape -o items.json -t json

Aucun commentaire:

Enregistrer un commentaire