dimanche 19 juin 2016

HTTP Error 400: Bad Request (urllib)


I'm writing a script to get information regarding buildings in NYC. I know that my code works and returns what i'd like it to. I was previously doing manual entry and it worked. Now i'm trying to have it read addresses from a text file and access the website with that information and i'm getting this error:

urllib.error.HTTPError: HTTP Error 400: Bad Request

I believe it has something to do with the website not liking lots of access from something that isn't a browser. I've heard something about User Agents but don't know how to use them. Here is my code:

from bs4 import BeautifulSoup
import urllib.request

f = open("FILE PATH GOES HERE")

def getBuilding(link):
    r = urllib.request.urlopen(link).read()
    soup = BeautifulSoup(r, "html.parser")
    print(soup.find("b",text="KEYWORDS IM SEARCHING FOR GO HERE:").find_next("td").text)


def main():
    for line in f:
        num, name = line.split(" ", 1)
        newName = name.replace(" ", "+")
        link = "LINK GOES HERE (constructed from num and newName variables)"
        getBuilding(link)      
    f.close()

if __name__ == "__main__":
    main()

Aucun commentaire:

Enregistrer un commentaire