I am trying to make a crawler which goes to the web page and downloads all the images available on that page. My code looks like this
import random
import urllib.request
import requests
from bs4 import BeautifulSoup
def get_images(url):
code = requests.get(url)
text = code.text
soup = BeautifulSoup(text)
for img in soup.findAll('img'):
src = img.get('src')
download_image(src)
def download_image(url):
name = random.randrange(1, 100)
image_name = str(name) + ".jpg"
urllib.request.urlretrieve(url, image_name)
get_images("http://www.any_url.com/")
Now many images usually don't contain their full URL in their src
tag. Now, my question is that how can i get full URL of the images so that i can download them ?
Aucun commentaire:
Enregistrer un commentaire