Fill This Form To Receive Instant Help

Help in Homework
trustpilot ratings
google ratings

Homework answers / question archive / Yelp Reviews Summarising Online Reviews via Sentiment Mining   source: yelp

Yelp Reviews Summarising Online Reviews via Sentiment Mining   source: yelp

Computer Science

Yelp Reviews

Summarising Online Reviews via Sentiment Mining


source:             Rate Us On Yelp!


Yelp is an online business that accepts and publishes reviews by anyone and everyone about local businesses, ranging from dentists to hairstylists, hotels and shopping.


A particularly popular business type to review on yelp are restaurants, cafes and other eating establishments.  In this question, we are going to analyse a number of yelp reviews for restaurants, cafes, etc., and provide our statistical insights.

Data and Background Information

Yelpers have written hundreds of million reviews to date. Due to Yelp’s astounding popularity as a go-to site for recommendations (or warnings!) Yelp has become a very important site, particularly for small businesses who can achieve success or close down, based on their online reviews.

The data for this question is yelp_reviews.csv provided on the assignment page. The variables are:

•             user_id: anonymised Yelp reviewer id.

•             business_id: anonymised Yelp business id.

•             date: date of review (dd/mm/yyyy format).

•             stars: star rating. 1 star is the lowest rating, 5 stars is the best rating.

•             review_length: the review length in characters.

•             votes_funny: the number of votes received indicating the review was funny.

•             votes_useful: the number of votes received indicating the review was useful.

•             votes_cool: the number of votes received indicating the review was cool.

•             votes_total: the total number of votes. 

•             pos_words: the total number of ‘positive’ words used in the review.

•             neg_words: the total number of ‘negative’ words used in the review.

•             net_sentiment: the overall sentiment of the review – the difference between pos_words and neg_words. A positive number implies a positive review (more positive words than negative) and a negative number indicates the opposite.

For this assignment, you need to produce a report summarising a collection of requested statistical analyses and visualisations of the data. See the next page for details.

You will need to submit a proper written report and R-script file. As a guideline, excluding tables/figures, 2-3 pages of writing will be sufficient for the report. I won’t strictly count words so if you go over/under by a bit that’s fine, but this is a good ballpark to aim for. 

The report should contain:

1.            An introduction outlining the analysis to follow/background information. The introduction can be up to 2 paragraphs. For the purposes of this assignment a paragraph is 6-8 sentences. (5 marks)

2.            A statistical summary of stars, review_length, pos_words, neg_words and net_sentiment. Discuss general impressions from this statistical summary. (15 marks)

3.            Create tables of the counts of positive words and the counts of negative words. From these tables, produce a plot to display the first 20 entries. You may either produce one plot per table, or one plot overall – this is entirely your choice. Discuss the trends you see in this data. (12 marks)

4.            Now repeat step 3 for the data in net_sentiment and share any insights you might have about the output. (12 marks)

5.            Present and discuss the average review length per star category. Produce an appropriate visualisation of your choice of the average. Explain your choice of average and discuss the general behaviour of the data you have presented. (12 marks)

6.            Analyse reviews voted as useful (variable votes_useful): are there any relationship between usefulness of reviews and star-rating provided and/or length of the review. (12 marks)

7.            Study the number of reviews per day and how does it change over time. (12 marks)

8.            Select the best business and the best user in the dataset. You must create and explain your own criteria of being “the best”. (12 marks)

9.            Conclusions. (8 marks)

The general rule is that every question in your report should have graph/s, numbers, and discussion. There might be situations when it is impossible or impractical to have all three elements of the reporting. For example, conclusions don’t have data visualisations; some questions might not have meaningful numerical summary. However, you should always aim to present all three elements.

Don’t include programming code in your report. Don’t include discussions like “I use function xyz() to calculate total value”. These are examples of a poor presentation. Your readers are managers that need results of your analysis to get better understanding of the business and to support their decision-making process. They are not interested in the programming code or functions.

Purchase A New Answer

Custom new solution created by our subject matter experts


Related Questions