mercredi 6 juillet 2016

Python Tornado connections timing out early -- Any way to prevent timeout (HTTP 599 errors)?


I am using Tornado to asynchronously scrape data from many thousand URLS. Each of them is 5-50MB, so they take a while to download. I keep getting "Exception: HTTP 599: Connection closed http:…" errors, despite the fact that I am setting both connect_timeout and request_timeout to a very large number. Why, despite the large timeout settings, am I still timing out on some requests after only a few minutes of running the script?* Is there a way to instruct httpclient.AsyncHTTPClient to NEVER time out? Or is there a better solution to prevent timeouts? The following command is how I'm calling the fetch (each worker calls this request_and_save_url() sub-coroutine in the Worker() coroutine): @gen.coroutine def request_and_save_url(url, q_all): try: response = yield httpclient.AsyncHTTPClient().fetch(url, partial(handle_request, q_all=q_all), connect_timeout = 60*24*3*999999999, request_timeout = 60*24*3*999999999) except Exception as e: print('Exception: {0} {1}'.format(e, url)) raise gen.Return([])

Aucun commentaire:

Enregistrer un commentaire