Fill This Form To Receive Instant Help

Help in Homework
trustpilot ratings
google ratings


Homework answers / question archive / The objective of this project is to introduce you to several functions and features in Python for data gathering and analysis

The objective of this project is to introduce you to several functions and features in Python for data gathering and analysis

Computer Science

The objective of this project is to introduce you to several functions and features in Python for data gathering and analysis. In this assignment you are to retrieve a page(s) from the web (web scraping) and clean the data (data cleansing) and extract data that can be used for visualization. Web Scraping is used to extract information from one or many websites and process it into simple structures such as spreadsheets, database, or CSV file where it can be analyzed and visualized with different views such as charts, graphs, and pictorials. In data cleansing, extra pieces of information such as “$” or “,” or other extraneous content is removed to get the intended piece of data.

 

In this project you are to extract data from a real time web page and retrieve pieces of data from each entry on a page.

 

 

For example, a page on various books on python can be scraped for three pieces of data on each book such as Title, Price and Ratings.

 

The page is retrieved by passing the url using the request module, which then is parsed using the BeautifulSoup module, (See the Sample Code for scraping below).

Cleanse the data by removing extraneous items by using a function to remove them. (See the sample code below).

Using Re a specific item is extracted from the string by using the function to get the item from its surrounding code, (see the sample code below).

Evaluate each string to identify the required piece of data in this case price, (see the sample code below).

Re-evaluate the string to identify and get the other pieces of data.

 

While the above steps are not comprehensive, they are meant to orient you in the right direction. Please explore various sites and tutorials available on the web.

 

 

The result can be displayed, (see the sample)

 

From the least expensive to the most expensive along with the title and rating.

 

From the highest rating to the lowest rating along with the title and price.

 

A bar chart can be used to display the relationship between the price and the rating.

 

 

You can decide on the site to scrape and the pieces of data you collect.

 

Your code should be modularized and well documented. For data visualization you can use Matplotlib with 2 variants of graph.

 

 

LIBRARIES:

 

 

          The following libraries must be downloaded and installed for the purpose of the code development.

 

Beautifulsoup4

Matplotlib

Requests

Pandas

Re (Regular Expressions) (optional)

 

Your code should inform the user of its purpose with meaningful interactions.

 

Ideally you would have several URLs for different pages on different category of items such as kids’ toys, computers, televisions, cars, restaurants, shoes etc.

 

     Note: Using your browser visit and inspect these pages beforehand and determine the kind of data you can gather from them. In particular, look for <div>, <class>, <id> tags to identify sections and content that you need.

 

 

Provide a menu for the user to choose from your pre-determined categories.

 

 

Retrieve the page and for each item retrieve several pieces of data that can be used for presentation. (Rating, Price, Released date, Features, Availability, Color, shipping cost…)

 

Note: Your code should check for unavailable values or the missing values and treat them accordingly. A missing piece of data for an item could be given a value of 0 or an average value from the other items.

 

 

 

Using Pandas and MathplotLib organize and present the data along different features of the items, for instance, describe the data showing the mean, max, and min values for the items. You may sort and present the items by the availability and the shipping cost for example.

 

 

For visualization You can use any type of graphical representation such as bar chart, line chart, pie chart, scattered chart, high-low chart to present different views of the data. The goal is to provide meaningful information that can be determined quickly by looking at the data. For example, the relationship between the rating and price, or the shipping and handling cost.

 

 

Your code must be well-documented. Clearly indicate the name and the section number of the two individuals working on the project at the beginning of your code.

 

Save your file with last names of the two group members, for example Pankatala-Malilia-prog8.py and submit on or before the last day of classes December 8 at 11:55 pm. Please note ONLY one submission is required from one of the group members.

 

 

 

 

Sample Code, curtesy of Prasanth Budigi:

 

#Using Request, to get the page.

 

url = "https://www.flipkart.com/search?q=python%20books&otracker=search&otracker1=search&marketplace=FLIPKART&as-show=on&as=off";

 

result = requests.get(url)

 

doc = BeautifulSoup(result.text, 'html.parser')

 

 

# Function to remove the special characters

 

def rm_special_characters(string): 

 

    new_string = re.sub(r"[^a-zA-Z0-9]", "", string)

 

    return new_string

 

 

# Function to search for the data from the tags

 

def searching_string(string): 

 

    pattern = ">(.*?)<img"

 

    substring = re.search(pattern, string).group(1)

 

    return substring

 

 

#Extracting specific items.

 

if (types == "price"): # price

 

        priceTag = mydiv.find("div", {"class": "_30jeq3"})

 

 

        if (priceTag == None):

 

            price = 0

 

        else:

 

            price = int(float(rm_special_characters(priceTag.text[1:])))

 

        return price

 

 

 

Sample output:

 

 

Here is my Analysis:

 

Book with the Highest Price is: Python Data Structures and Algorithms

 

Book with the Highest Rating is: Python Data Structures and Algorithms

 

Book with Lowest Price is: Cbse All in One Computer Science with Python Class 12 f...

 

Book with lowest Rating is: Beginning Data Science with Python and Jupyter

Option 1

Low Cost Option
Download this past answer in few clicks

17.99 USD

PURCHASE SOLUTION

Already member?


Option 2

Custom new solution created by our subject matter experts

GET A QUOTE