Mapping Texts to Multidimensional Emotional Space: Challenges for Dataset Acquisition in Sentiment Analysis
Скачать файл:
URI (для ссылок/цитирований):
https://link.springer.com/chapter/10.1007/978-3-030-02846-6_29https://elib.sfu-kras.ru/handle/2311/129223
Автор:
Alexander, Kalinin
Anastasia, Kolmogorova
Galina, Nikolaeva
Alina, Malikova
Коллективный автор:
Институт филологии и языковой коммуникации
Кафедра романских языков и прикладной лингвистики
Дата:
2018-11Журнал:
International Conference on Digital Transformation and Global SocietyКвартиль журнала в Scopus:
Q3Библиографическое описание:
Alexander, Kalinin. Mapping Texts to Multidimensional Emotional Space: Challenges for Dataset Acquisition in Sentiment Analysis [Текст] / Kalinin Alexander, Kolmogorova Anastasia, Nikolaeva Galina, Malikova Alina // International Conference on Digital Transformation and Global Society. — 2018. — С. 361-367Аннотация:
The cornerstone for any sentiment analysis research is labeled data and its acquisition. Canonical corpuses for this task contain different reviews (movies, restaurants) where sentiment can be derived from reviewer’s explicit rating of a reviewed item. Ratings go with supplied comments, which are used as text samples and ratings are converted into labels. Usually emotion labels come in binary form like “negative\positive”.
This simplistic approach works well when we are dealing with binary emotional model, but it turns to fail when we are dealing with more complex emotional models like “Pleasure-Arousal-Dominance (PAD)” or Lövheim’s Cube, when we collect data from various sources and of different types (fiction books, social networks conversations, blog posts etc.) or when we delegate labeling to external assessors.
In the article, we describe which methodological problems we faced while collecting dataset for sentiment analysis backed by Lövheim’s Cube - emotional model that represents an emotion as a point in three-dimensional space of balance of three monoamines (Dopamine, Serotonin and Noradrenaline).
These problems include the choice of necessary metadata to be collected along with text and labels, choice of tools used for labeling and survey design.