Looking to learn how to integrate retries in the right way in your Python requests scraping script? Then you are on the right page, as the article below provides you the step-by-step guide on how to get that done.
Whenever you design the logic of your Python requests-based web scraper, you need to keep in mind that things will not always go your own way. And one of the problems that may occur is that your requests will fail. This could be a result of a connection error or even your target blocking you. If this happens, your code to break and throw an exception.
While throwing an exception is not a bad thing, we will want a code that is robust and can even retry some of its actions until it is sure it can’t proceed before calling up on you. Fortunately for us, Python’s requests module does have support for retrying requests for the number of times you set it. In this guide, I will show you how to make use of this to retry failed requests in your web scrapers.
Python Requests Retry — An Overview
Requests have been designed to be robust in themselves and make a lot of your work easier for you. However, it does have one aspect you just have to do things from your own end. Requests is designed in a way that there is no retry function or logic embedded into it.
If you need your code to retry requests that failed, you will need to code the logic for that, and for the most part, you will either have to use the Retry Object in urllib or simply code the retry function yourself. Fortunately for us, doing this is not difficult, as you can do that with just a few lines of code.
How to Retry Requests Using Custom Retry Logic
The easiest method to retry requests with Python requests is by developing your own custom retry logic. All you basically have to do is have a variable that holds the number of tries you want and then loop through it in a range as you send the web requests. Once your requests is successful, you break out of the for loop. If it does not, you keep retrying until you reach the maximum allowed retries before throwing an exception. Below is an example of how to get that done.
import requests NUMBER_OF_RETRIES = 5 for i in range(NUMBER_OF_RETRIES): try: response = requests.get(“YOUR_TARGET_URL”) if response.status_code in [200, 404]: break except requests.exceptions.ConnectionError: pass If response is not None and response.status_code == 200: print(response.content)
As you can see above, I check the status code to see whether it is 200 (successful) or 404 (page not found). This is because, in either of these cases, there is no need to retry the request. The last if statement is to check if the request was successful and then carry out the task you want. In my own case above, all I did was to print the content to the screen.
In another way, you can just create a reusable retry function that wraps Python’s requests. Below is how to get that done.
import requests def retry_requests(url, num_of_retries=5, **kwargs): for i in range(num_of_retries): try: response = requests.get(url, **kwargs) if response.status_code in [200, 404]: return response except requests.exception.ConnectionError: pass return None x = retry_requests(“https://google.com/search”, number_of_retries=3) print(x,content)
How to Retry Requests Using Sessions & HTTPAdapter
Another way to integrate the request logic into your code is to use the sessions and HTTPAdapter. This is a little more complex compared to just creating the logic yourself, as done above. However, if you use the Requests.Session object often, there is no better option for you to retry file requests than to make use of this. Let's first take a look at the code then I explain.
import requests from requests.adapters import HTTPAdapter from requests.packages.urllib3.util.retry import Retry s = requests.Session() retries = Retry(total=5, status_forcelist=[429, 500, 502, 503, 504]) s.mount('http://', HTTPAdapter(max_retries=retries)) x = s.get(TARGET_URL)
As you can see above, the code depends on the HTTPAdapter and Session Class in the requests module as well as the Retry Object.
FAQs About Python Requests Retry
Q. Does Python Requests Support Out of the Box?
Python’s requests does come with support for retry out of the box. But it is not obvious to most beginners. If you check the last code above, you will see that the retry function is present in the requests module. However, its usage is quite advanced. If you find it difficult to comprehend, you can still create your own retry function that you understand so you can understand the logic of your script better.
Q. Why Do Python Requests Fail?
Requests failing has nothing to do with Python requests. Once you try to access a remote resource, many things could go wrong that will cause your request to fail. You might be accessing a URL that’s taking too long to respond until timeout, or even your connection not going through. In some other cases, it could just be as a request of bad request from the way the request is sent.
Q. Is Python Requests Good at Handling Retries?
Yes, Python request is good at handling retries, provided you instruct it to do so. As stated earlier, it does have support for that, and that is what was discussed in the second method of handling retries. For the first method, you aren’t depending on the support of requests but coding your own logic to prevent your script from throwing an exception when your request fails but retry it a number of times first.
Having your code retry request makes your code robust and protects it from occasional breaking that isn’t really a reason for the code to break. I do web scraping a lot, and because of the region I am with bad Internet, your script can just break because of low Internet connectivity. But with the help of retries, I did not have to interface until a real issue that needed my attention occurred.