I am using Tornado to asynchronously scrape data from many thousand URLS. Each of them is 5-50MB, so they take a while to download. I keep getting "Exception: HTTP 599: Connection closed http:…" errors, despite the fact that I am setting both connect_timeout and request_timeout to a very large number.
Why, despite the large timeout settings, am I still timing out on some requests after only a few minutes of running the script?* Is there a way to instruct httpclient.AsyncHTTPClient to NEVER time out? Or is there a better solution to prevent timeouts?
The following command is how I'm calling the fetch (each worker calls this request_and_save_url() sub-coroutine in the Worker() coroutine):
@gen.coroutine
def request_and_save_url(url, q_all):
try:
response = yield httpclient.AsyncHTTPClient().fetch(url, partial(handle_request, q_all=q_all), connect_timeout = 60*24*3*999999999, request_timeout = 60*24*3*999999999)
except Exception as e:
print('Exception: {0} {1}'.format(e, url))
raise gen.Return([])
Aucun commentaire:
Enregistrer un commentaire