BEET: Behavior Engagement Emotion Trigger – A AI-based model to maximize audience engagement

Badawy, Wael

doi:10.21608/mjeer.2021.193089

BEET: Behavior Engagement Emotion Trigger – A AI-based model to maximize audience engagement

Document Type : Original Article

Author

Wael Badawy

School of Engineering and Technology Badr University at Cairo Cairo, Egypt

10.21608/mjeer.2021.193089

Abstract

Engagement and measuring is a key performance
for any content. The wide spread of the Internet and the adoption
in our daily life, audience has many options to explore and
audience developed the power of rejection. This power of
rejection reduces the time of exposure and engagement to any
contents including the in person engagement. The access to the
online tools and portals provide an escape route from human-tohuman
interaction. A new tool to maximize audience engagement
to a content that can influence a decision becomes a key
performance and an asset for exposure. It enables a better
delivery of the Contents, an increased interaction with the
content, simplify the process of creating an engaging content, and
maximize the exposure time. This paper presents a prototype,
and evaluation of the performance of an automated Content
Behavior Emotion Engagement Model “BEEM” that uses Deep < /div>
Learning and Big Data analysis to discover the Behavior
Emotion trend of the audience. BEETAIR is a new innovative
framework that will transform the media market through
Behavior Emotion Engagement Trigger Analyzer, and Intelligent
Recommender. It uses The Artificial Intelligent-based Dialogue
Generator for Maximum Audience Engagement

Highlights

We developed an AI application to measure the impact of BEET
on human engagement. The test has been applied 23,453 test cases
where a personalized articles of an average of 2,000 words is
customized to an audience with been profile. An audience is exposed
to a random article vs. personalized article. We computed the
engagement time and we determined that an none-customized article
will be rejected with an average of 13.6 second and a customized
article has an engagement time of an a average of 72.4 second. A
performance will be presented in the final paper.

Keywords

Full Text

Measuring quality of content and its audience related engagement

factor is a key performance of marketing, media and journaling. The

effective content to impact audience has to increase its engagement

level. Affective Computing (AC) has been a popular area of research

for several years where machines detect and understand human

affective states, such as emotions, interests and the behavior. It is

assumed that to become more user-friendly and effective, systems

need to become sensitive to human emotions. Nonverbal information

is considered to complement the verbal message providing a better

interpretation of the message. It is claimed that 70–90% of

communication between humans is nonverbal. The studies conducted

by Albert Mehrabian in 1967 established the 7%–38%–55% rule,

also known as the “3V rule”: 7% of the communication is verbal,

38% of the communication is vocal and 55% of the communication is

visual [1].

The first study in the field of emotion detection was born during

the sixties/seventies. The most prominent example is that of mood

rings [2]. The principle is simple; rings contain thermotropic liquid

crystals that react with body temperature. When a person has

stressed, his mood ring take on a darker color.

The scientific publications of Rosalind Picard (MIT) have

introduced a great progress in this field since the nineties [3, 4]. He is

one of the pioneers of affective computing. In his book “Affective

Computing”, Picard proposed that emotion can be modeled using the

nonlinear sigmoid function. Over the last 20 years, the development

of technology has allowed the implementation of relatively good

system market and efficient such as ambient intelligence (AMI),

virtual reality (VR) and augmented reality (AR).

Nowadays, in the automotive field for example, an on-board

computer that is able to detect confusion, interest or fatigue can

increase safety. The AutoEmotive (MIT Media Lab) is a prototype

equipped with sensors and a camera placed on the steering wheel [5].

This vehicle measures the level of stress and fatigue of the driver.

When the need arises, he puts a background music, changes the

temperature and light in the vehicle interior, or still proposes to

follow a less stressful journey.

A multimodal system is widely adopted and several multimodal

datasets include sentiment annotations. Zadeh et al. introduced the

first multimodal dataset (MOSI) with opinion-level sentiment

intensity annotations and studying the prototypical interaction

patterns between facial gestures and spoken words when inferring

sentiment intensity. A multimodal dictionary using language-gesture

study is proposed in a speaker-independent model for sentiment

intensity prediction [6]. For other examples of data sets we can cite

ICT-MMMO [7] and MOUD [8] datasets. Intra-modality dynamics is

modeled through three Modality Embedding Subnetworks, for

language, visual and acoustic modalities, respectively [9]. LTSMbased

network to extract contextual features from the video for

multimodal sentiment analysis is shown in [10]. A multimodal

sentiment analysis framework, which includes sets of relevant

features for text and visual data, as well as a simple technique for

fusing the features extracted from different modalities [11].

Multimodal emotion analysis has the following challenge: (1)

model the interactions between language, visual and acoustic

behaviors that change the observation of the expressed emotion

(named the inter-modality dynamics). (2) Multimodal emotion

analysis (named intra-modality dynamics) is to efficiently explore

emotion, not only on one but also on highly expressive nature

modality (ex.-spoken language where proper language structure is

often ignored, video and acoustic modalities which are expressed

through both space and time.

The emotion analysis lacks the ability to measure the engagement

between the user and the content, the interaction with user to

influence the user decision, and keep the user in front of the content.

This paper presents a new model to measure the user behavior

emotion trigger and measure the engagement level of the user. It also

demonstrate a technique to personalized the content and introduce a

metric to measure engagement. The reset of the paper is structure as

follows: Section II presents s review of emotion and sentiment

analysis. Section III presents the proposed model Section IV

Experiment and performance evaluation. And finally Section V

conclude this paper.

II. EMTION AND SENTIMENT ANALYSIS

“Sentiment analysis is the field of study that analyses people’s

opinions, sentiments, evaluations, appraisals, attitudes, and emotions

toward entities such as products, services, organizations, and their

attributes. It represents a large problem space. There are also many

names and slightly different tasks, e.g., sentiment analysis, opinion

mining, opinion extraction, sentiment mining, subjectivity analysis,

affect analysis, emotion analysis, review mining, etc.” [12]

Sentiment Analysis (SA) [13] is a computational study of how

opinions, attitudes, emoticons and perspectives are expressed in

language. Sentiment Detection, or in its simplified form – Polarity

Classification, is a tedious and complex task. Contextual changes of

polarity indicating words, such as negation, sarcasm as well as weak

syntactic structures make it troublesome for both machines and

humans to safely determine polarity of messages.

Sentiment analysis methods involve building a system to collect

and categorize opinions about a product. This consists in examining

natural language conversations happening around a certain product

for tracking the mood of the public. The analysis is performed on

large collections of texts, including web pages, online news, Internet

discussion groups, online reviews, web blogs, and social media.

Opinion Mining aims to determine polarity and intensity of a given

text, i.e., whether it is positive, negative, or neutral and to what

extent. To classify the intensity of opinions, we can use methods

introduced in [14, 15, 16, 17].

Text Mining and Social Network Analysis have become a

necessity for analyzing not only information but also the connections

across them. The main objective is to identify the necessary

information as efficiently as possible, finding the relationships

between available information by applying algorithmic, statistical,

and data management methods on the knowledge. The automation of

sentiment detection on these social networks has gained attention for

various purposes [18, 19, 20, 21].

The aim of [22] was to report on the associations between

depression severity and the variability (time-unstructured) and

instability (time-structured) in emotion word expression on Facebook

and Twitter across status updates. Several works on depression have

emerged. They are based on social networks: Twitter [23, 24] and

Facebook [25, 26].

Several authors have been interested in the use of emoticons to

complete the sentiment analysis. Authors in [27] utilize Twitter API

to get training data that contain emoticons like :) and :(. They use

these emoticons as noisy labels. Tweets with :) are thought to be

positive training data and tweets with :( are thought to be negative

training data. In [28], authors present the ESLAM (Emoticon

Smoothed LAnguage Models) which combine fully supervised

methods and distantly supervised methods. Although many TSA

(Twitter Sentiment Analysis) methods have been presented. The

authors in [29] explored the influence of emoticons on TSA.

Automatic emotion recognition based on utterance level prosodic

features may play an important role within speaker-independent

emotion recognition [30]. The recognition of emotions based on the

voice has been studied for decades [31, 32, 33, 34]. Paper in [35]

focused on mono-modal systems with speech as only input channel.

Artificially influence mental and emotional states to get a better

individual performance in stress-related occupations and prevent

mental disorders from happening [36]. Recent research has shown

that under certain circumstances multimodal emotion recognition is

possible even in real time [37].

Sound signals (including human speech) is one of the main

mediums of communication [38] and it can be processed to recognize

the speaker or even emotion. There are some physical features

applied for indexing speech, like: spectrum irregularity, wide and

narrow band spectrograms, speech signals filtering and processing,

enhancement and manipulation of specific frequency regions,

segmentation and labeling of words, syllables and individual

phonemes [37]. Moreover, the Mel-Frequency Cepstral Coefficients

(MFCC) is widely used in speech classification experiments [39]. For

the reduction of leakage effect, the Hamming window is

implemented. This is necessary for increasing the efficiency of

frequency in human speech [38].

MPEG 7 Audio standard contains descriptors and description

schemes that can be divided into two classes: generic low-level tools

and application-specific tools [40]. Artificial Neural Networks

(ANN), k-Nearest Neighbor (k-NN) and Support Vector Machines

(SVM), decision trees, probabilistic models such as the Gaussian

mixture model (GMM) or stochastic models such as Hidden Markov

Model (HMM) can be applied [36].

Emotion analysis of speech is possible; however, it highly

depends of the language. Study by Chaspari et al. showed that

emotion classification in speech (Greek language) achieved accuracy

up to 75.15% [41]. Similar study by Arruti et al. showed mean

accuracy of 80.05% emotion recognition rate in Basque and a

74.82% in Spanish [42].

Nonverbal behavior constitutes useful means of communication

in addition to spoken language [43] identifies at least six

characteristics from posed facial actions that enable emotion

recognition: morphology, symmetry, duration, speed of onset,

coordination of apexes and ballistic trajectory. They are common to

all humans confirming Darwin’s evolutionary thesis. Therefore, an

emotional recognition tools based on facial video is universal.

Automatic detection of emotions from facial expressions are not

simple and their interpretation is largely context-driven. To reduce

the complexity of automatic affective inference, measurement and

interpretation of facial expressions, Ekman and Friesen developed in

1978 special system for objectively measuring facial movement; the

Facial Action Coding System (FACS) [45]. FACS, based on a system

originally developed by a Swedish anatomist named Hjortsjö [46]

became the standard for identifying any movement of the face. Later,

Ekman and Sejnowski studied also computer based facial

measurements [47].

Automatic emotion recognition based on physiological signals is

a key topic for many advanced applications (safe driving, security,

mHealth, etc.). Main analyzed physiological signals useful for

emotion detection and classification are:

• electromyogram (EMG) - recording of the electrical

activity produced by skeletal muscles,

• galvanic skin response (GSR) - reflecting skin resistance,

which varies with the state of sweat glands in the skin controlled by

the sympathetic nervous system, where conductance is an indication

of psychological or physiological state,

• respiratory volume (RV) - referring to the volume of air

associated with different phases of the respiratory cycle,

• skin temperature (SKT) - referring to the fluctuations of

normal human body temperature,

• blood volume pulse (BVP) - measures the heart rate,

• heart rate (HR),

• electrooculogram (EOG) - measuring the corneo-retinal

standing potential between the front and the back of the human eye,

• photoplethysmography (PPG) - measuring blood volume

pulse (BVP), which is the phasic change in blood volume with each

heartbeat, etc.

The recognition of emotions based on physiological signals

covers different aspects: emotional models, methods for generating

emotions, common emotional data sets, characteristics used and

choices of classifiers. The whole framework of emotion recognition

based on physiological signals has recently been described by [55].

References

1. Mehrabian, A., Ferris, S.R.: Inference of attitudes from nonverbal
communication in two channels. J. Consult. Psychol. 31(3), 248 (1967)
2. Mood Ring Monitors Your State of Mind, Chicago Tribune, 8 October
1975, at C1: Ring Buyers Warm Up to Quartz Jewelry That Is Said to Reflect
Their Emotions. The Wall Street Journal, 14 October 1975, at p. 16; and “A
Ring Around the Mood Market”, The Washington Post, 24 Nov. 1975, at B9
3. Picard, R.W.: Affective Computing. MIT Press, Cambridge (1997)
4.Picard, R.W., Vyzas, E., Healey, J.: Toward machine emotional
intelligence: analysis of affective physiological state. IEEE Trans. Pattern
Anal. Mach. Intell. 23(10), 1175–1191 (2001)

5. Hernandez, J., et al.: AutoEmotive: bringing empathy to the driving
experience to manage stress. In: DIS 2014, 21–25 June 2014, Vancouver, BC,
Canada. ACM (2014). http://dx.doi.org/10.1145/2598784.2602780. 978-1-
4503-2903-3/14/06
6. Zadeh, A., Zellers, R., Pincus, E., Morency, L.P.: Multimodal sentiment
intensity analysis in videos: facial gestures and verbal messages. IEEE Intell.
Syst. 31(6), 82–88 (2016). https://doi.org/10.1109/mis.2016.94
7. Wöllmer, M., et al.: YouTube movie reviews: sentiment analysis in an
audio-visual context. IEEE Intell. Syst. 28(3), 46–53 (2013)
8. Perez-Rosas, V., Mihalcea, R., Morency, L.P.: Utterance-level multimodal
sentiment analysis. In: ACL, vol. 1, pp. 973–982 (2013)
9. Zadeh, A., Chen, M., Poria, S., Cambria, E., Morency, L.P.: Tensor fusion
network for multimodal sentiment analysis, arXiv:1707.07250. In:
Proceedings of the 2017 Conference on Empirical Methods in Natural
Language Processing, 7–11 September 2017, Copenhagen, Denmark, pp.
1103–1114. Association for Computational Linguistics
10. Poria, S., Cambria, E., Hazarika, D., Majumder, N., Zadeh, A., Morency,
L.P.: Context-dependent sentiment analysis in user-generated videos. In:
Proceedings of the 55th Annual Meeting of the Association for Computational
Linguistics, vol. 1, pp. 873–883 (2017)
11. Poria, S., Cambria, E., Howard, N., Huang, G.B., Hussain, A.: Fusing
audio, visual and textual clues for sentiment analysis from multimodal
content. Neurocomputing 174(Part A), 50–59 (2016).
12. Liu, B.: Sentiment analysis and opinion mining. Synth. Lect. Hum. Lang.
Technol. 5(1), 1–167 (2012)
13. Pang, B., Lee, L.: Opinion mining and sentiment analysis. J. Found.
Trends Inf. Retrieval 2(1–2), 1–135 (2008)
14. Dziczkowski, G., Wegrzyn-Wolska, K.: RRSS - rating reviews support
system purpose built for movies recommendation. In: Wegrzyn-Wolska,
K.M., Szczepaniak, P.S. (eds.) Advances in Intelligent Web Mastering.
Advances in Soft Computing, vol. 43, pp. 87–93. Springer, Berlin (2007).
15. Dziczkowski, G., Węgrzyn-Wolska, K.: An autonomous system designed
for automatic detection and rating of film. Extraction and linguistic analysis of
sentiments. In: Proceedings of WIC, Sydney (2008)
16. Dziczkowski, G., Węgrzyn-Wolska, K.: Tool of the intelligence
economic: recognition function of reviews critics. In: ICSOFT 2008
Proceedings. INSTICC Press (2008)
17. Kepios: Digital in 2018, essential insights into internet, social media,
mobile, and ecommerce use around the world, April 2018.
18. Ghiassi, M., Skinner, J., Zimbra, D.: Twitter brand sentiment analysis: a
hybrid system using n-gram analysis and dynamic artificial neural network.
Expert Syst. Appl. 40(16), 6266–6282 (2013)
19. Zhou, X., Tao, X., Yong, J., Yang, Z.: Sentiment analysis on tweets for
social events. In: Proceedings of the 2013 IEEE 17th International Conference
on Computer Supported Cooperative Work in Design, CSCWD 2013, 27–29
June 2013, pp. 557–562 (2013)
20. Salathé, M., Vu, D.Q., Khandelwal, S., Hunter, D.R.: The dynamics of
health behavior sentiments on a large online social network. EPJ Data Sci. 2,
4 (2013).
21. xxi.21. Sriram, B., Fuhry, D., Demir, E., Ferhatosmanoglu, H.,
Demirbas, M.: Short text classification in Twitter to improve information
filtering. In: Proceedings of the 33rd International ACM SIGIR Conference
on Research and Development in Information Retrieval, 19–23 July 2010, pp.
841–842. http://doi.acm.org/10.1145/1835449.1835643
22. Seabrook, E.M., Kern, M.L., Fulcher, B.D., Rickard, N.S.: Predicting
depression from language-based emotion dynamics: longitudinal analysis of
Facebook and Twitter status updates. J. Med. Internet Res. 20(5), e168
(2018).
23. Wang, W., Hernandez, I., Newman, D.A., He, J., Bian, J.: Twitter
analysis: studying US weekly trends in work stress and emotion. Appl.
Psychol. 65(2), 355–378 (2016)
24. Reece, A.G., Reagan, A.J., Lix, K.L., Dodds, P.S., Danforth, C.M.,
Langer, E.J.: Forecasting the onset and course of mental illness with Twitter
data (Unpublished manuscript). https://arxiv.org/pdf/1608.07740.pdf
25. Park, J., Lee, D.S., Shablack, H., et al.: When perceptions defy reality: the
relationships between depression and actual and perceived Facebook social
support. J. Affect. Disord. 200, 37–44 (2016)
26. Burke, M., Develin, M.: Once more with feeling: supportive responses to
social sharing on Facebook. In: Proceedings of the ACM 2016 Conference on
Computer Supported Cooperative Work, pp. 1462–1474 (2016)
27. Go, A., Bhayani, R., Huang, L.: Twitter sentiment classification using
distant supervision. J. CS224N Proj. Rep., Stanford 1, 12 (2009)
28. Liu, K.L., Li, W.J., Guo, M.: Emoticon smoothed language models for
Twitter sentiment analysis. In: AAAI (2012)
29. Węgrzyn-Wolska, K., Bougueroua, L., Yu, H., Zhong, J.: Explore the
effects of emoticons on Twitter sentiment analysis. In: Proceedings of Third
International Conference on Computer Science & Engineering (CSEN 2016),
27–28 August 2016, Dubai, UAE
30. Bitouk, D., Verma, R., Nenkova, A.: Class-level spectral features for
emotion recognition. Speech Commun. 52(7–8), 613–625 (2010)
31. Busso, C., et al.: Analysis of emotion recognition using facial expressions,
speech and multimodal information. In: Sixth International Conference on
Multimodal Interfaces, ICMI 2004, October 2004, State College, PA, pp.
205–211. ACM Press (2004)
32. Dellaert, F., Polzin, T., Waibel, A.: Recognizing emotion in speech. In:
International Conference on Spoken Language (ICSLP 1996), October 1996,
Philadelphia, PA, USA, vol. 3, pp. 1970–1973 (1996)
33. Lee, C.M., et al.: Emotion recognition based on phoneme classes. In: 8th
International Conference on Spoken Language Processing (ICSLP 2004),
October 2004, Jeju Island, Korea, pp. 889–892 (2004)
34. Deng, J., Xu, X., Zhang, Z., Frühholz, S., Grandjean, D., Schuller, B.:
Fisher kernels on phase-based features for speech emotion recognition. In:
Jokinen, K., Wilcock, G. (eds.) Dialogues with Social Robots. LNEE, vol.
427, pp. 195–203. Springer, Singapore (2017). https://doi.org/10.1007/978-
981-10-2585-3_15
35. Steidl, S.: Automatic classification of emotion-related user states in
spontaneous children’s speech. Ph.D. thesis, Erlangen (2009)
36. Lugovic, S., Horvat, M., Dunder, I.: Techniques and applications of
emotion recognition in speech. In: MIPRO 2016/CIS (2016)
37. Kukolja, D., Popović, S., Horvat, M., Kovač, B., Ćosić, K.: Comparative
analysis of emotion estimation methods based on physiological measurements
for real-time applications. Int. J. Hum.-Comput. Stud. 72(10), 717–727
(2014)
38. Davletcharova, A., Sugathan, S., Abraham, B., James, A.P.: Detection and
analysis of emotion from speech signals. Procedia Comput. Sci. 58, 91–96
(2015)
39. Tyburek, K., Prokopowicz, P., Kotlarz, P.: Fuzzy system for the
classification of sounds of birds based on the audio descriptors. In:
Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A.,
Zurada, J.M. (eds.) ICAISC 2014. LNCS (LNAI), vol. 8468, pp. 700–709.
Springer, Cham (2014).
40. Tyburek, K., Prokopowicz, P., Kotlarz, P., Michal, R.: Comparison of the
efficiency of time and frequency descriptors based on different classification
conceptions. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz,
R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2015. LNCS (LNAI), vol. 9119,
pp. 491–502. Springer, Cham (2015).
41. Chaspari, T., Soldatos, C., Maragos, P.: The development of the Athens
Emotional States Inventory (AESI): collection, validation and automatic
processing of emotionally loaded sentences. World J. Biol. Psychiatry 16(5),
312–322 (2015)
42.Arruti, A., Cearreta, I., Alvarez, A., Lazkano, E., Sierra, B.: Feature
selection for speech emotion recognition in Spanish and Basque: on the use of
machine learning to improve human-computer interaction. PLoS ONE 9(10),
e108975 (2014)
43. Ekman, P.: Facial expression and emotion. Am. Psychol. 48, 384–392
(1993)
44. Jack, R.E., Schyns, P.G.: The human face as a dynamic tool for social
communication. Curr. Biol. Rev. 25(14), R621–R634 (2015).
45. Ekman, P., Friesen, W., Hager, J.: Facial action coding system: Research
Nexus. Network Research Information, Salt Lake City (2002)
46. C.H.: Man’s face and mimic language (1969).
47. Ekman, P., Huang, T.S., Sejnowski, T.J., et al.: Final report to NSF of the
planning workshop on facial expression understanding, vol. 378. Human
Interaction Laboratory, University of California, San Francisco (1993)
48. Afzal, S., Sezgin, T.M., Gao, Y., Robinson, P.: Perception of emotional
expressions in different representations using facial feature points. IEEE
(2009). 978-1-4244-4799
50. De la Torre, F., Chu, W.S., Xiong, X., Vicente, F., Ding, X., Cohn, J.:
IntraFace. In: IEEE International Conference on Automatic Face and Gesture
Recognition Workshops (2015).