The Why, What, and How of Affective Computing

Affective Computing is all about computers understanding human emotions. By computer I mean a robot, your car, your laptop, or even your phone. And, you could be thinking: Why we want computers to be able to do that? How computers could be able to do it? Two very important questions. Let’s elaborate on them.

Because emotions signal what matters to us and what we care about; further it has been argue that emotions impacts our rational decision-making and action selection. Think about your decision-making when buying a new phone, a new laptop, a car, or selecting a Univerity, a new job in a new Company, etc. Are these emotion-driven or completely rational. By providing computers with the ability of understanding their user’s emotions a computer could show empathy –sense other’s emotions and react in consequence in a proper way. Imagine scenarios, such as: (1) a video game aware of your levels of engagement or excitement, it could adapt the difficulty level; or (2) a tutoring system or online course aware of your interest or boredom, it could decide which material to present to you next. Hope you get the idea –a lot of potential to improve human-computer interaction. Yes, we are aware of the possible issues including security. But, that is another story.

Ok, we are talking about “affective” computing and we mention “emotions”. Let me clarify some important concepts. Disclaimer: I am going to oversimplify things — this is a computer science perspective on the topic.

Affect is used as an encompassing term to describe emotions and moods. Affect refers to the underlaying experience of feeling. It is a construct of mental activity and physiological reactions.

Therefore, we talk about affective computing, affective states, affective signals, affect measurement, or affect recognition. Affect ranges from unpleasant to pleasant (valence), from agitated to calm (arousal), and from easy to hard to control (dominance).

Let us review an analogy using colors. Any color in a screen computer is a combination of Red, Green and Blue (RGB), right? R, G and B are the axis of a 3D space in which all color can be located. And, we limited the value for each axis to range only, for instance, from 0 to 255. Then, we could present colors as a combination of red, green, and blue as shown in Figure 1. Thus, a color is a vector such as [248, 114, 23] and some colors have names such as [248, 114, 23] is the Pumpkin color.

Figure 1. A 3D space for Colors — with Red, Green, and Blue as axis

Ok, what about affect? Well, think about a similar 3D space where the axis are Pleasure, Arousal, and Dominance instead of Red, Green, and Blue. And, let us limit the possible values in each axis to be a decimal number between -1 to 1. Each dot in the space represents an affective state.Now, let’s give names to some of the dots in that 3D space, for example, as shown in Figure 2, [1,1,1] is complete Engagement, [1, 1, -1] is complete Excitement, [-1, -1, -1] is total Boredom, [-1, 1, -1] is complete Frustration, and so on. These names are what we identify as emotions. However, as happens with colors, not all affective states have a name. Not all of them are what we identify as an individual emotion. Maybe, because we do not have a name, we think about them as a combination of the names that we know. Just like considering Magenta a color or just a combination of Red and Blue. 🤔

Figure 2. A 3D space for Affective States — with Pleasure, Arousal, and Dominance as axis

Emotions are states of mind resulting of experiencing affect, i.e., experiencing chemicals released throughout our body and brain (physiological reactions) in response to our interpretation of a specific contextual stimulus (mental activity).

Computers do not care about the name or whether a name exists or not. Computer wants numbers, right? Think about using a camera as an input device (either using photos or video) and ask the computer to identify colors. We will work with RGB values. We can label some RGB vectors to make our live easier. The same apply to affetive states (emotions).

To identify colors we use a camera — a device that captures light rays. To identify affective states (emotions) we need devices also. Devices to capture mental activity or physiological reactions. Then, we can connect the values that a device or devices collect with affective states (emotions). In the same way that we connect the values captured by a camera with RGB values.

Which devices? There are a lot of options! For example:

a) Brain-Computer interfaces, such as the Emotiv Headset. Brain-Computer interfaces capture the electromagnetic activity of the brain. Affective states (Pleasure, Arousal, Dominance) and therefore emotions can be related with particular values of electromagnetic activity in specific regions of our brain.

b) A camera, yes, a simple camera. We can use a camera to run facial recognition. And, in a face we can recognize gestures. Long story make short: there is a limited number of muscles in your face, and a limited number of movements they can do. Thus, there is a finite number of gestures common to any human face. And, some of these gestures correspond with a particular mental activity or physiological reactions, i.e., with experiencing an affective state (emotion). Similar claims can be done using a camera to recognize body postures.

c) Physiological sensors such as sensors to measure heart rate or galvanic skin conductance. These measures are related to high or low level of arousal.

d) And more, much more.

Computers can infer emotions from the data that we gather from one or more sources. They study the cues (body language, facial expressions, gestures, tonality of voice, physiological signals, and even brain activity) and infer an affective state. Figure 3 depicts the process. One sensor gather data from one source, machine learning algorithms are applied to the collected data to infer an affect measurement (perception process). But, multiple monomodal measurements can be synchronized and fused (multimodal affect recognition).

Figure 3. Affect recognition steps: collect data (sensing), inference process (perception), and (if multiple sensors available) combine multiple sources into one inference (integration).

How accurate are these inferences? Well, the goal is to make them as reliable as those infered by a human. Can you look to a person in the street and guess their emotional state? What about a friend or a person that you see everyday? It will be dificult or maybe impossible to have a 100% certainty of your inference, but the precision increase when you have more information or more time in contact with the person. The same is true for the computer!

If you area software developer, I ask you, How could you improve whatever functionality your are implementing in a software if you could have real-time access to a stream of data that corresponds to the affective state of your user?

Footer