Abstract: done on phrase-level which will be more


Abstract: Sentiment analysis or opinion mining is the computational study of people’s opinions, sentiments, attitudes, and emotions expressed in written language. It is one of the most active research areas in recent years. Its popularity is mainly due to two reasons. First, it has a wide range of applications because opinions are central to almost all human activities and behaviors. Whenever we need to make a decision, we want to hear others’ opinions. Second, it presents many challenging research problems. Many approaches have been introduced in the past couple of decades but accuracy and precision were always a challenge. Our paper paves the way showing how it can be addressed by a different approach. It’s called aspect level sentiment analysis. In this approach, analysis is carried out in phrase level rather than word level. It represents the sentiment more accurately than in that of the previous methods. At the end, we will discuss how accurate this approach is in terms of analyzing the sentiment than the previous methods as a comparative analysis.Index terms: sentiment analysis, social media, aspect-based.1. Introduction: Sentiment analysis is the process of analysing the opinions, feelings and attitude of the author about a particular product, topic, task, organization etc. Hence, it is known as opinion mining. Social media has made people to express their emotions, feelings and suggestions as comments voluntarily. It is complex to find out the overall opinion and suggestions of the people. Sentiment analysis has been a popular research area in the past few years. Many approaches and algorithms have been introduced for this analysis. Every method has its own advantages and disadvantages. In most of the methods, lack of accuracy has been a drawback. That is why we are now focusing on aspect-level sentiment analysis. Aspect-level sentiment analysis is the process of analysing the sentiment based on phrases and not just words 2. This clearly means that the analysis is made by understanding the emotion or feeling of the author through the entire statement. This was a drawback back then where the results were not accurate because it does the analysis on words considering positive and negative words. But in aspect-level sentiment, the analysis of the emotions is done on phrase-level which will be more meaningful and accurate. 2. Three Different Classes of Sentiment AnalysisSentiments can be classified into three different class i.e. positive, negative and neutral sentiments. a. Positive Sentiments: These are considered as the good words about the object/subject in concern. If there is lot of positive sentiments, it is denoted as good. b. Negative Sentiments: These are the bad words about the object/subject in consideration. If there is lot of negative sentiments, it is rejected from the preference list. c. Neutral Sentiments: These are neither good nor bad words about the product. Hence it is neither preferred nor ignored. 3. Three Levels of Sentiment classification There are three different levels of sentiment classification i.e. word level, phrase level and document level sentiment classification 4. a. Word Level Classification: This type of classification is done on the basis of the words that indicate the sentiment about the target. Word level classification is based on lexicon-based approach. It only classifies the words which expresses sentiment. The word may be noun, adjective or adverb. Word level classification does not give more accurate classified sentiments. b. Phrase Level Classification: This classification results in positive as well as negative category. The phrase signifying the attitude is found out from the sentence and the classification is done. But then it sometimes gives incorrect results if a negative word is added in front of the phrase. Aspect level sentiment analysis can be a better way to achieve phrase level classification with accuracy. The phrase is a group of words which builds a meaningful sentence. c. Document Level Classification: In this type of classification, single document is considered about the prejudiced text. A single evaluation about the single subject from the document is considered. The document may consist of sentences which don’t look like an opinion. So the document level classification will not be effective to know the overall opinion.   4. Challenges in sentiment analysisA tweet contains entity, feature/aspect and sentiment. The sentiment word refers to the words used by the user to express their feeling. The feature/aspect is the perspective of how the tweet can be classified whether it is positive, negative or neutral. For example, “The food was good but the restaurant was not great”, with this statement if we take it in the aspect of food, it is considered as positive. But if we take it in the aspect of restaurant, it is negative.  Ø In most of the time, tweets are highly unstructured and non-grammatical which will make it difficult to fetch the tweets by keywords. It will also be difficult to preprocess.Ø The use of out of vocabulary words is also a problem which makes it difficult to classify the polarity or emotion of the words.Ø Sarcastic sentences always tend to be difficult for the sentiment classification.Ø Extensive use of acronyms (asap, omg, lol, rofl, idk, btw) which is also a challenge during classification.5. Approaches · Unsupervised learning· Supervised learning 5.1. Unsupervised learning Unsupervised learning is more likely called as the real artificial intelligence – the idea that a computer can learn to know complex processes and patterns without human guidance. Though unsupervised learning is complex for some simpler  use cases, it helps on solving problems that humans cannot normally tackle. Some examples of unsupervised machine learning algorithms include k-means clustering and association rules 6. Unsupervised learning problems can be grouped into clustering and association problems.Ø Clustering: A clustering problem is discovering the inherent groupings in the data, such as grouping employers by their performance.Ø Association:  An association rule learning problem is discovering rules that describes large portions of our data, such as people that buy A also tend to buy B. Some well-known examples of unsupervised learning algorithms are:· k-means for clustering problems.· Apriori algorithm for association rule learning problems.     5.2. Supervised learning Supervised learning is the commonly used method among the two. It includes algorithms such as linear and logistic regression, multi-class classification, and support vector machines. It is named as supervised learning because the data scientists will act as guide to teach the algorithm what results it should conclude with. Supervised learning requires that the algorithm’s possible outputs that are already known and the data used to train the algorithm is already labeled with correct answers 6. For example, a classification algorithm will learn to identify persons after being trained on a dataset of images that are properly labeled with the names of the persons and some identifying characteristics.Supervised learning problems can be further grouped into regression and classification problems.Ø Classification: A classification problem is when the output variable is a category, such as “red” or “blue” or “present” and “absent”.Ø Regression: A regression problem is when the output variable is a real value, such as “pounds” or “rupees”. Some eminent examples of supervised machine learning algorithms are:· Linear regression (for regression problems).· Random forest (for classification and regression problems).· Support vector machines (for classification problems). Choosing to use either supervised or unsupervised machine learning algorithm typically depends on the factors related to the structure and amount of your data and the use case of the issue at hand 6. Both types of algorithms can be used to build predictive data models that help us make decisions across various challenges.6. Accuracy of algorithmAccuracy: Total number of predictions that are correctly classifiedAccuracy = (TN + TP)/(TN + TP + FN + FP)TP – True PositiveTN – True NegativeFP – False PositiveFN – False Negative  Accuracy of any method or approach is determined by the correctness of the prediction i.e., If a positive tweet is correctly predicted as positive, it is called True Positive (TP). But if it gets it wrong, it is called False Positive (FP). Similarly, If a negative tweet is correctly predicted as negative, it is called True Negative (TN). But if it gets it wrong, then it is treated as False Negative (FN).   7. Twitter Twitter is a widely famous and well-known social networking platform and micro blogging service which lets the user to express their opinions and feeling through their posts, which are commonly known as Tweets. Tweets are relatively small messages which had a limit bound of 140 characters previously but now it has been doubled to 280 characters for all languages except Japanese, Korean and Chinese on November 7, 2017. People also use a lot of acronyms, emoticons, short words in their tweets to express their feeling. Following are the some of the terminologies that used in tweets Target: Twitter users uses the symbol “@” to refer the target user or micro blogger that will automatically alert the target user.Emoticons: Emoticons are the pictorial representations of the feeling that are used to convey the feeling of the user quickly.Hash tags:  Hash tags are usually used to mark up the important topics. Hash tags increase the visibility of their tweets.    7.1. Sentiment classification  Word level and document level classification may also produce inaccurate results. Sometimes it is insufficient in many applications. Hence, we need to understand the sentiments of the tweets based on analysing the aspect and sentiment of the opinion.For e.g., “yes…. I love #rainbow”. In this tweet, #rainbow is the entity, love is the sentiment. The opinion on this general aspect is positive. 7.2. Sentiment analysis processData extraction:  Twitter contains huge amount of data. Therefore, we need to extract the tweets on a particular topic from the twitter API.Data pre-processing: This technique involves the cleaning of data by removing the punctuations, stem words, spell correction etc.Applying algorithms: Algorithms are applied to categorize the tweets based on the polarity and emotion of the tweets.Visualization: The result of the sentiment classification is represented in the graphs.8. Future enhancementsIn future, this paper can be extended by classifying the tweets in aspect level analysis. Aspect level sentiment analysis can be helpful to even classify a single tweet with mixture of emotions. Considering the aspect of a tweet, the accuracy of sentiment analysis will be more precise.9. Conclusion This paper shows how the research on sentiment analysis on social media data has evolved in years. From the earlier method where analysis is carried out based on word-level to the present method where the analysis is carried out based on phrase-level which is also known as aspect-level sentiment analysis. This paper clearly demonstrates how the aspect-level analysis is helpful to achieve accuracy and precision. 

x

Hi!
I'm William!

Would you like to get a custom essay? How about receiving a customized one?

Check it out