Multi-languange affect analysis using neural networks with zero shot cross lingual transfer learning.

Abstract

This project demonstrates using AI to determine emotion in text sentences from tweets in (close to) any language.

It has an averaged accuracy of 83% against test data (ranging from 76% to 93% depending on the specific emotion) and has only learned from reading the equivalent of 1 printed newspaper from Monday to Friday on one single week. To give some context, there is research available showing that the agreement between human analysts when performing sentiment analysis can be quantified at the 80% - 90% range.

A note of warning: this AI doesn't understand irony or sarcasm, so mind the results if you feed that kind of tweets (it can be hilarious!)

But what is affect analysis?

Let's start with sentiment analysis. Sentiment analysis is a field of study dedicated to systematically extracting affective states and subjective information from language.

The most basic analysis possible is polarity: determining if a body of text or speech has positive, negative or neutral connotations. Affect analysis is a classification task where text or speech is mapped to concrete emotional states like "happy", "sad" or "angry". This project does affect analysis to determine 11 emotions in sentences: anger, anticipation, disgust, fear, joy, love, optimism, pessimism, sadness, surprise and trust.

OK, and neural networks?

Neural networks, or formally Artificial Neural Networks (ANN), are systems inspired in how the human brain, and neurons in particular, work.

They are made of layers: one input layer, one output layer and any number of hidden layers. Layers are built with a number of nodes (or artificial neurons), and nodes are interconnected with nodes from the neighbour layers. At each node, the incoming connection have weights and the output ones have thresholds (above a certain level they fire signal,below they do not).

All these parameters are determined by running successive training processes (measuring the accuracy of the output for known inputs) and iterating them with mathematical optimization functions. This allows a neural network to learn relationships or patterns from the training data that is fed. A trained neural network can then be applied to identify those relationships or patterns on new data.

Tell me about sentence embeddings

Natural Language Processing (NLP) is the field of AI that deals with enabling machine understanding of human language. NLP systems can be used for tasks like document summarization, translation, speech recognizion, predictive typing, sentiment analysis, classification of texts, etc.

Text in particular needs to be transformed into vectors of numbers to allow computers to run them through algorithms. These vectors are called embeddings. Two established ways to do this are processing the words or processing the sentences.

There are many processes to obtain sentence embeddings. At first, the methods proposed were able to produce embeddings on a specific language or for a specific corpus like news articles. Recently, multi-language methods have been developed.

LASER (Language Agnostic SEntence Representations) by facebook Research is one of these methods. Their model was trained using a dataset of 223 million parallel sentences in 93 languages from 34 families and 28 scripts. It can provide sentence embeddings on a 1024-dimensional space that is language-agnostic.

Now explain to me zero shot crosslingual transfer learning

Because the embeddings from LASER are not tied to a specific language, they can be used to train AI systems for specific NLP tasks (like classification) in one language (say English) and then applying the AI on any other language (maybe Spanish). This training technique is what is called zero-shot crosslingual transfer learning.

So, step by step, what happens here?

When you select a tweet (either pasting its link or using one of the preloaded examples) its full raw text is retrieved.

This text is then sent to a server that breaks it down into sentences.

The server transforms each of this sentences to vectors of 1024 values. These values represent the "meaning" of the sentence independent of the language: "I love you" and "Je t'aime" get almost the same 1024 values. These are the sentence embeddings.

Next, the server uses this values as inputs of the 11 neural networks that we have trained for each specific task and results in a probability for each one of the 11 emotions being present on each sentence.

The sentences and the probabilites are sent back to your web browser, and displayed on screen.

The technical explanation

The development process of this AI method and a playground can be found here: i18n-twitter-sentiment