Our client, a Multinational Investment Management firm, wanted to develop a machine learning model to classify sentiment for tweets.

Our client, a Multinational Investment Management firm, wanted to develop a machine learning model to classify sentiment for tweets.

Hivemind’s integrated hook into Amazon Mechanical Turk provided instant access to a large crowd, who were able to perform one man year’s worth of judgement collection in less than a day.
 

Our client, a multinational investment management firm, approached Hivemind to work with them on a sentiment analysis project. They wanted to develop a machine learning model to classify sentiment for tweets that mentioned one or more entity names of interest. The client was particularly interested in sentiment directed towards each individual entity name, rather than the tweet as a whole.

In order to build the model, the client needed a training dataset of sentiment scores systematically elicited from humans, for 10,000 tweets, and they needed them within a short space of time.

How Hivemind helped

Hivemind’s integrated hook into Amazon Mechanical Turk provided instant access to a large crowd, who were able to perform one man year’s worth of judgement collection in less than a day. Each tweet was sent to five different humans, and Hivemind’s data scientists helped to aggregate the multiple judgements into a single sentiment score for each entity mentioned within each tweet.

Before judgements could be aggregated, they first needed to be checked for spam and subsequently calibrated. Where there was strong evidence, for instance multiple failed captchas, that a contributor was in fact a bot or simply providing spam responses, Hivemind rejected the judgements and re-sent the microtasks.

Calibration is necessary where dealing with sentiment tasks, as sentiment is inherently subjective. Different people will have different baselines for judging tweets on a positive-negative scale, with some being more positively inclined than others. To deal with this it is important to centre and scale judgements. First an individual’s average judgement was centred around neutral by converting categorical positive and negative sentiment labels to numerical positive and negative scores, and subsequently scale the spread of an individual’s sentiment to provide consistent judgement variation between Turkers.

We finally calculated median scores for each tweet to produce a training set of aggregated sentiment scores, enabling the client to train a sentiment analysis model and apply it to a large corpus of tweets containing entity names of interest.

contact us to Find out how Your BUSINESs CAN make informed decisions using aggregated opinion OR sentiment.