chrisdaly

datascience{python} :|: dataviz{d3}

Yelp Sentiment Analysis

Sentiment analysis is the use of natural language processing, text analysis and computational linguistics to identify subjective information. It is often used by companies to quantify general social media opinion (for example, using tweets about several brands to compare customer satisfaction). One of the simplest and most common sentiment analysis methods is to classify words as "positive" or "negative", then to average the values of each word to categorize the entire document. This analysis measures the sentimentality of a word through its ratio of occurances in 5 star ratings to 1 star ratings in Yelp reviews.


One of the most apparent things about this visualization is that words with a positive rating are generally describing the food, whereas negative words are generally describing the service. Any manager looking to efficiently improve their restaurant should first look at their customer service before thinking about amending the menu or altering the ambiance.

The data used in this visualization is taken from from Kaggle's Yelp Business Rating Prediction competition, a collection of millions of restaurant reviews, each accompanied by a 1-5 star rating. The data was processed and the sentiment ratio for each token was calculated in an ipython notebook. A naiive bayes model was also used to predict the rating of new reviews with an accuracy of 92% achieved.