
I love sport and used to practice a lot of sport, running, swimming, weightlifting etc. With March 2020 lockdown, I wanted to buy a set of dumbbells in order to keep in shape by working out at home. But guess what? I was not the only one that had this idea! Between the out of stock and raise of prices, it was impossible to get a pair of dumbbells at a reasonable price.
The project
The aim of this project is creating a tool that will make a survey on the price done by UK Amazon reseller for a specific product (pair of dumbbells of 20kg), convert this price using an actual GBP to EUR exchange rate, check if the converted price is lower than a specific threshold, and receive a whatsapp notification if yes with the amount and the Amazon URL.
This project is the first version of 3 articles where I will present different ways to scrap the web using beautiful soup library. The simplest one here to get the product title, price, URL and actual GBP to EUR exchange rate. On the next one I will present a full loop scraping in a real estate web site to find the best house in your favourite city.
Step 1: The libraries
Import Requests : Requests is a python library that allow you sending HTTP request in an easy way. I put below the user guide if you are interested in:
From BS4 import BeautifulSoup: BeautifulSoup is a Python library for pulling data out of HTML and XML files. I will explain below how it works but if you are interested in a full user guide, I put the link here aswell:
Step 2: The exchange rate
We will get the exchange rate from this website:
You can navigate into it and use the currency you need. Then let’s store it into a variable “page” through using the library requests to make an HTTP request to ask if we can get in the site.
page = requests.get('https://www.x-rates.com/calculator/?from=GBP&to=EUR&amount=1')
You can easily check the result of the request by checking what is in “page”.
print(page)
If this line of code show an answer starting with 2 (like 200) it means that you are authorised to get in the website. If it starts with 3 it means that you were redirected to another page. And if it starts with 4 (like the famous error 404) it means that there is an error and you cannot access.
Then we need a second variable “soup” that will mage the variable page as our BeautifulSoup file on which we will read HTML code:
soup = BeautifulSoup(page.text, ‘html.parser’)
Now let’s talk about HTML. HyperText Markup Language is a text editing language that used with technologies such as Cascading Style Sheets (CSS)and programming language like JavaScript allow web developers creating all the nice website you use to navigate in.
HTML is working using tags. which means that every single element of a website (Titles, paragraphes, images etc.) are stored into a tag, and then if we need an element of the page, we just need to call this specific tag. “Ok….thank you…then what if I have no clue about HTML?” No problem! most of web browsers have a smooth way to check the HTML code of a website.
Anywhere in the page, clic right and then clic on “inspect”
you will see on the right (in Google chrome but in can be in the bottom for another browser) the full code. And by passing through the code you will see different part of the page getting highlighted.
Starting from here it’s just a matter of playing “hide and seek” game. We just have to find this line the part of the page we are interested in (the 1.103384 exchange rate for us), then clic on the arrow on the left of the code in order to see the sub code, and do the same until have the single line related to this information.
Here it is. We have the line, and the tag is “span”. And to identify this tag compared to potential other “span” in this page we will also get the “class” information in this line.
price_box = soup.find('span',{'class':'ccOutputRslt'})
So we store in the variable “price_box” the result of the request done with “soup” into “page” : “please find the line with the tag “span” and class “ccOutputRslt”. Regardless if the specific FX is 0.8 of 1.3, as we are storing the path to this information, we will always get the last exchange rate.
If we print “price_box” now we will have “1.103384 EUR” because we stored the result of the equivalent in EUR of 1GBP. to get the rate, we just need to remove the “EUR”:
rate = price_box.text.replace("EUR", ' ')
Here is the recap:
Step 3: The price in Amazon
If we do exactly the same using amazon URL, we would get an error starting from the first step. Maybe a response code 503 “service unavailable”. Why? because accessing the website using a bot is not compliant with the general condition of use of Amazon. It’s not illegal, but not really moral. So I take the opportunity here to clarify one point. The purpose of this tutorial is to teach you a fair use of web scraping that can be useful for your Data science/Data analysis project, or personal fair project, but I don’t recommend using it to spam any website with automatic request and always respect the condition of uses (even if I know that you don’t read them!).
So in the request we will add an argument that will simulate the request made by a browser:
res = requests.get(url,headers=HEADERS)
And the variable HEADERS is:
HEADERS = ({'User-Agent':
'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36',
'Accept-Language': 'en-US, en;q=0.5'})
Then we do the same as the exchange rate but the unique difference is using features ‘lxml’ to navigate into the HTML code :
res = requests.get(url,headers=HEADERS)
soup = bs4.BeautifulSoup(res.content, features='lxml')
Then we will have to extract 2 information, the title and the price. To do it we proceed on the same way by inspecting the code, and store the results into 2 variables:
For the title the tag is “id” and we use the method “get_text” to get the human readable text and “strip” to remove some line break that would be present without.
title = soup_bis.find(id="productTitle").get_text().strip()
For the amount the tag is also an “is”, we use same methods, and in addition we remove the currency sign and for me I have to replace the coma separator with a dot.
amount = float(soup_bis.find(id='priceblock_ourprice').get_text().replace("£","").replace(",",".").strip())
Then we just have to multiply the amount in GBP with the rate calculated in previous step:
amount_eur=amount*float(rate)
And make an “if” statement to check if the price in EUR is lower than a threshold we will set up/
if amount_eur<=TrackingPrice(the trashold):
offer.append("You got a offer on the {0} for {1}. Check out the product {2}".format(title,amount_eur,url))except:
offer.append("Couldn't get details about product")
Now we can put everything into a function named tracker(url,TrackingPrice) with 2 arguments: url as the Amazon url to get the price, and Tracking price as the threshold.
Here is the full function
Now you just have to run set both variable with your URL and your threshold:
and print the result:
Step 4: Additional tip, whatsapp notification
We could easily store the result into a cvs or Excel file. But when I worked into this project I really wanted to get notified in my phone. I hesitated between different notification tools, and went for Twilio whatsapp API.
Twilio messaging tool will basically provide you a whatsapp number linked to a specific ID and API key that will allow you to set up the message to be send. The process is really simple so I will put here below the official documentation of Twilio and then my codes where you will find as message body my variable “offer”
Before sending the message don’t forget to “pip install twilio”.
Full code:
Conclusion:
I hope that you enjoyed this tutorial. If you used one of the method shared in this article I would be pleased to see you project aswell! Do not hesitate to tag me or share them with me on twitter, instagram or LinkedIn. And if you decided to apply another way to get a notification let me know!