TDM 40100: Project 12 — 2023
Motivation: In general, scraping data from websites has always been a popular topic in The Data Mine. In addition, it was one of the requested topics. We will continue use books.toscrape.com to practice scraping skills, visualize scrapped data, use sqlite3
to save scrapped data to database
Context: This is a third project focusing on web scraping combined with sqlite3
Scope: Python, web scraping, selenium, BeautifulSoup, sqlite3
Questions
Question 1 (2 pts)
In the previous project, you have been able to scrape data from books.toscrape.com
Now let’s visualize your scrapped data
-
Please visualize the books' price of music category with a bar plot. Split the prices into three price ranges: below 20, 20-30, above 30
You may need to change the price to float, like
books is the book list from the previous project’s function "get_all_books", like this: books = get_all_books("Music_14") |
You may use sum to group prices, like
|
You may use a bar chart, like
|
Question 2 (2 pts)
-
Write
CREATE TABLE
statements to create 2 tables, namely, acategories
table and abooks
table.
Check on the website for category information. The categories table may contain following fields - 'id' a unique identifier for each category, auto increment - 'category' like 'poetry_23' |
Check on the website for book information. The "books" table may contain following fields - 'id' a unique identifier for each category, auto increment - 'title' like 'A light in the Attic" - 'category' like 'poetry_23' - 'price' like 51.77 - 'availability' like 'in stock(22 available)' |
Use
Run the following queries to confirm and show your table schemas.
|
Question 3 (2 pts)
-
Update the function "get_category" from project 11. After you get the information about categories from the website, populate the "categories" table with that data.
-
Run a couple of queries that demonstrate that the data was successfully inserted into the database.
Here is partial code to assist.
|
Question 4 (2 pts)
-
Update the function "get_all_books" from project 11. After you get the information about books from from website, populate the "books" table with that data. You may need to scrap new data for a new field of "category" that the book belongs to.
-
Run a couple of queries that demonstrate that the data was successfully inserted into the database.
In project 11, we used an associate array to hold the book_info like this:
We may need to use a different data structure like tuple to hold book information since we need to insert it to the books table, like this:
|
Here is partial code to assist.
|
Project 12 Assignment Checklist
-
Jupyter Lab notebook with your code, comments and output for the assignment
-
firstname-lastname-project12.ipynb
-
-
Submit files through Gradescope
Please make sure to double check that your submission is complete, and contains all of your code and output before submitting. If you are on a spotty internet connection, it is recommended to download your submission after submitting it to make sure what you think you submitted, was what you actually submitted. In addition, please review our submission guidelines before submitting your project. |