Tuesday, November 19, 2013

Opinion Mining




Opinion mining: 

Goal of opinion mining is to identify the textual parts that express emotions. In other words it is Sentiment analysis. Application of opinion mining comes under the decision making process. It converts people's voice into a statistic table so that it will be useful for entrepreneur. Nowadays the area of sentiment analysis is flourishing with lots of research activities.

Relevance

As per survey 81% of Internet users are surfing for their product research (for restaurants, hotels, and various services) at least once. Among those between 73% and 87% report that, reviews had a great role on their purchase. Consumers are ready to pay more for a higher-rated item than a lower-rated. Online ratings systems provide 32% reviews and 30% of them have posted as online comments or reviews.
Due to the diversity of sources it is not easy to get all the review contexts from Web. In some cases it requires authentication, some other cases opinions are hidden along with forum posts and blogs. It is very difficult for a human reader to go through relevant sources, collect contexts, extract pertinent sentences, analyze, summarize, classify and organize them into a usable form. In this situation an automated opinion mining tool will be a help desk for a customer.

Early work

How do people think about..? Researchers are trying to address this question using opinion mining. Identifies the polarity of opinion words and document-level positive or negative sentiment classification are some of the initial work has done in this area. In fact this is not exactly needed for a feature based opinion mining process. For example let’s take a review on a phone. A customer might like its screen but dislike its battery. Then researchers started working on feature based opinion mining which mined opinions on different product features. This task is known as feature-level opinion mining.

Challenges

Let’s see some challenges that we have faced during our OM process.
  1. Figure out the proper linking between emotions and its topic is really a thought-provoking task.
For example: “I'm looking for a good twitter app for my apple ipad”.
Here there are 2 possible heads (twitter app, apple ipad) and a single adjective (good). Proper Linking should be between “good” and “twitter app”. Like “good twitter app”.


  1. Find out the emotions from sarcastic sentences is not easy, cases like, if the sentence has some sarcastic meaning or else if it needs an external knowledge to define the emotion.
Sometimes people may make comments sarcastically, either by putting some sarcastic smileys or by having some sarcastic meaning.
For example: “I like their product verrrrrrry much ….. ;) ;)”. It may be a sarcastic review. To determine this we need some external knowledge regarding this person or his/her previous comments.
  1. In some other case topic might be in the previous sentence and referring that using some pronouns such as “it, he, and they” etc...
Look at this example,
I showed it to Tom and Mary. He also liked”
Here “He also liked” is the opinion part, normally the head taken as “He”. But here the actual head is Tom. Pronouns are not proper heads.


  1. Nowadays people using shorthand such as “U” instead of “You”, smileys etc...
I lve ma ipad ” People are widely using shorthand to make comments. It is very difficult to resolve these shorthand words, like
lve” = “love”, “ma” = “my” and “2moro” = “tomorrow” etc…


  1. In some scenarios combination of some words can create some emotions,
For example: damn beauty
Here “damn” is a negative emotion and “beauty” conveys a positive emotion and “damn beauty” is a positive emotion. Other example is “deep shit”


  1. How to get the data? We can pull out only 20-30% of user reviews from World Wide Web using some connectors to the social media websites such as Twitter, Facebook, YouTube and DIGG etc…


What are the available methods?

Basically there are 2 methods, Supervised and Unsupervised. We get more accurate results by using the Machine learning approach (Supervised), but the challenge is to get the training data and also its scope is always limited. Languages and its usages are very flexible. Even if we made some training sets it will be outdated soon. There are lot of tools are available for Topic Extraction and Sentiment analysis. Some of them are listed below.

Tools
There are some tools available for this purpose, like KEA, MAHOUT, MALLET, MAUI, WEKA, SmILE, SentiWordNet and RapidMiner.
Almost all the topic extraction tools are based on machine learning. It is useful for document level extractions and classifications. This is not what we exactly needed for feature level opinion mining. SentiWordNet is a good one for finding the emotion of a word.

Development frameworks

To develop such kind of applications there are some development frameworks like GATE, UIMA, and NLTK etc… According to the use and development criteria we can choose any one of these. These all are open source tools. It allows different types of plugins that are useful for this type of tasks.

Mining process (Unsupervised)

Every opinion has at least two parts a Head (Topic) and a Sentiword (the word describes emotion).
For the proper identification of Opinion parts (Head and Sentiword), an excellent POS Tagger and Gazetteers (list of commonly used nouns, phrases, sentiwords and smileys) are needed. Topic can be a Person, an Object or a Term and Sentiwords are basically categorized into Positive and Negative. Linking of a sentiword to a proper head is based on some constraints that we have given. Opinion text should be understandable and meaningful. A feature based classification of opinions is an added task for opinion mining.

Accuracy

How can we measure the accuracy of an Opinion mining application? While doing Opinion mining process, agreement between humans is around 85% only, using some sort of training we can make it above 90%. But agreement between human and system is pretty much lesser than this. Measurement of this can be done by precision and recall. Correlation can give the closeness towards the predicted value. Benchmarking tools are available for such type of measurements.

Uses

Today it has a wide range of applications like Brand Monitoring, Buzz Monitoring, Online Anthropology and Online Consumer Intelligence. In other words, say social media monitoring. Opinion mining helps us in decision making process. It is useful for individual as well as organization. Summarization of opinions makes consumer to take informed as well as valid decisions. Opinion mining applications are becoming as the essential part of businesses and organizations. For example, it is always critical information for a product manufacturer “how consumers accept their products” and those of its competitors. This information is not only useful for marketing and product but also useful for product design and product developments.

External References

2 comments:

  1. This is a challenging area definitely..There can be a matrix which leads to a final decision about the emotion in case of sarcastic inputs. Matrix for smileys, matrix for sarcastic words and expressions . Hopefully the sentence can then be processed using each matrix and normalised.Again a normailised value against emotion matrix is to be formed and then later if there is very minute variation it can be used for manual decision. This manual decision then can be used in future to automatically map such behaviours to the automatic emotions.But again the varying input terminologies and the usages will be again a matrix which needs to be updated everytime..Just some thoughts on initial understanding.

    ReplyDelete
    Replies
    1. Hi Vineeth, Yes you are right, sarcasm can redirect to manual decision. But identifying sarcasm is the bottle neck here. I am not saying it is not possible. But it will always has exceptions, because the sarcastic element directly links to the behavior (or thought or perception or experience) of the writer and the target (time is also one dimension). To understand this we have to track a set of comments posted from his/her side and comments about the target. We need to scratch this portion a bit more for the real matrix. Currently some techniques has been proposed. University of sheffield recommended some method to identify this like they are getting the tags #sarcasm from twitter and treating them as sarcastic comments. :)

      Delete