Using Twitter data to measure emotions

/ Author: Jaap van Sandijk
© Hollandse Hoogte
How measurable are emotions on Twitter, and what kinds of patterns, if any, can be recognised in them? Language technology engineer Florian Kunneman recently completed his PhD thesis on this topic at Radboud University Nijmegen. He studied millions of tweets and concluded that automated recognition of emotions is possible, based on a number of different hashtags.

Hashtag as a resource

Kunneman split his research into two parts: detecting events and detecting emotions. He then merged the two parts in order to draw a comparison between emotions before and after a particular event: 'I classified tweets based on the presence of emotional disappointment expressed after and positive anticipation expressed before an event, then checked for any correlations between them.' The hashtag – the pound sign (#) followed by one or more keywords – proved to be a useful resource in this process, as it provides some sort of context. Kunneman found that only a limited number of hashtags can be used for automated research on the recognition of emotions. One of them is #zinin ('feel like'): 'This hashtag is often used in announcements of entertaining events, makes a great match with the content of the tweet and turned out to be really suitable. On the other hand, #omg – 'Oh My God' – can be about any sort of emotion, so that is not useful.'

User-friendly results

Kunneman's research draws attention as there is growing interest in the implementation of big data for research work. A major challenge is how to 'translate' huge amounts of data into reliable, user-friendly research results. How did he tackle this challenge? 'I’ve developed a filtering system which targets tweets expressing a particular emotion. This system is comparable to the way in which Google filters search results for relevance. Among the hundreds of thousands of results, I have introduced a ranking order by allotting degrees of probability: this type of message definitely expresses positive expectations, while for that type it’s not entirely sure. This is how we developed a system that works quite well.'

Biggest hurdle

The biggest obstacle in the filtering system is the finetuning, Kunneman says: 'You can work with a narrow filter and get perfect, but too few results; on the other hand, a very wide filter can let through all kinds of other stuff that you don't need.' The most suitable filter has to be created somewhere in between, and this is extremely precise work. There is no blueprint for that type of filter. 'However, if you are looking for a blueprint in this research process, there is one I can offer from my own experience: go for a wide filter at first, so you’re better able to see where the wrong output comes from. Then you can still make the necessary adjustments. Working the other way around – going for a narrow and very precise filter at first – is much harder, simply because there is no way of telling what should fit into your template.’

CBS Social tension indicator

Kunneman thinks the particular value of detecting emotions on Twitter lies in the possible extension of standard (statistical) research. 'It is a good supplemental tool which may help reduce your reliance on surveys. Major advantages in any case are being able to process much larger quantities of data and perform ongoing research, throughout time. This potentially means it is easier to follow trends rather than with a survey.' Kunneman is following developments around the social tension indicator, an innovative product developed by CBS. The indicator very specifically measures tensions or unrest in society. This gives it an edge over more general measurements of positive or negative sentiment via social media. For this indicator, a validated list was produced with the help of qualitative research, specifically related to (in)security. ‘I am aware of this research project at CBS. As a language technology engineer, I’m glad to see this initiative. This social tension indicator is an example of a new method that reduces CBS’ dependence on surveys.'

Vaccinations

Ali Hürriyetoglu is Kunneman's co-worker at Nijmegen Radboud University and was also his assistant during the latter’s degree ceremony. He is involved in methodology work at CBS' Center for Big Data Statistics. Does he think Radboud University and CBS might collaborate? 'My research focuses on applications and is easier to implement for the business community rather than for a statistical bureau, which sets many more requirements', is his carefully formulated reply. He continues: 'Although it is a realistic aim. We might achieve reasonable success with an assistant or post-graduate researcher on this project.' For the time being, however, Kunneman will be working on his next project as a post-graduate researcher at Radboud University: detecting emotions about vaccinations on Twitter, a project commissioned by the National Institute for Public Health and the Environment (RIVM).