Machine Learning and Social Media

What other people think has always been an important piece of information for our decision-making process. Nowadays, the Internet and the Web allow us to find answers to this question beyond the circle of our personal acquaintances.

With the rise of WEB 2.0, many people use social media to post opinions on almost any subject – events, products, topics. The area of opinion mining and sentiment analysis deals with the computational treatment of opinion, sentiment, and subjectivity in text and allows us to draw conclusions on the attitude of people towards such subjects. Such insights are essential for product design and advertisement, event planning, political campaigns and decision making in general.

Traditional sentiment analysis techniques focus on static data, i.e., given a (static) dataset of opinionated documents a (static) machine learning model is learned that is able to recognize the sentiment of future unseen instances. However, as opinions accumulate from the social streams over time, changes might occur. Such changes might refer to the general sentiment towards a subject (e.g., how did the sentiment of Angela Merkel evolve over the past years) or towards specific facets of this subject (e.g., what are the aspects people discuss about Angela Merkel over time and what is their associated sentiment), or in the words used to express sentiment (e.g., usage of sentiment intensifiers). Subjects discussed in the social media also change over time. A static machine learning model is not able to cope with changes. In OSCAR, we develop opinion stream mining methods that deal with change and adapt the sentiment models continuously to adapt to the underlying evolving population.

The OSCAR project tackles three challenges:

Evolution in data and vocabulary: Changes in the data generating process, also known as “concept drift” are not limited to changes in the priors of the polarity classes. The vocabulary itself may also exhibit drifts, as new words show up, old words get out of use, and the polarity associated with each word may change over time. To this end, we develop adaptive machine learning models that tackle both feature and concept drifts.
Interplay of document label and word polarity: For many words, the polarity is context-dependent. For example, “serious” may be positive or negative, depending on whether it refers to a relationship or to an illness. Since it is not feasible to capture all possible contexts in which a word might appear, algorithms that learn the polarity of documents must cope with the ambiguities of word polarity. To this end, we employ ensemble learners which combine multiple learners, each learning specific aspects of the problem.
Label sparsity: In many platforms, users upload opinions without explicitly specifying the sentiment they associate with their texts. Nevertheless, labelled data are fundamental for training ML models. To this end, we employ semi-supervised learning methods that leverage both labelled and unlabelled data and active learning approaches that carefully involve human in the loop by asking to label only a few carefully selected instances, which suffice for model learning and adaptation.

The output of OSCAR is a complete framework encompassing active ensemble learning methods that deal with different forms of change and learn with limited expert involvement. Such a framework can be used in other stream classification tasks, beyond sentiment analysis like for example, predictive maintenance, network monitoring etc.

Contact

Prof. Dr. Eirini Ntoutsi

ntoutsi@l3s.de

Eirini Ntoutsi is Professor for Open Source Intelligence at the Faculty of Computer Science of the Universität der Bundeswehr München

MSc Damianos Melidis

melidis@l3s.de

Damianos Melidis is a PhD student at the Faculty of Electrical Engineering and Computer Science, Leibniz University Hannover.

MSc Vasileios Iosifidis

iosifidis@l3s.de

Vasileios Iosifidis is a researcher at the Faculty of Electrical Engineering and Computer Science, Leibniz University Hannover.