The enigmatic, painted smile of the “Mona Lisa” is known around the world, but that famous face recently displayed a startling new range of expressions, courtesy of artificial intelligence (AI).
In a video shared to YouTube on May 21, three video clips show disconcerting examples of the Mona Lisa as she moves her lips and turns her head. She was created by a convolutional neural network — a type of AI that processes information much as a human brain does, to analyze and process images.
Researchers trained the algorithm to understand facial features’ general shapes and how they behave relative to each other, and then to apply that information to still images. The result was a realistic video sequence of new facial expressions from a single frame.
For the Mona Lisa videos, the AI “learned” facial movement from datasets of three human subjects, producing three very different animations. While each of the three clips was still recognizable as the Mona Lisa, variations in the training models’ looks and behavior lent distinct “personalities” to the “living portraits,” Egor Zakharov, an engineer with the Skolkovo Institute of Science and Technology, and the Samsung AI Center (both located in Moscow), explained in the video.
Zakharov and his colleagues also generated animations from photos of 20th-century cultural icons such as Albert Einstein, Marilyn Monroe and Salvador Dali. The researchers described their findings, which were not peer-reviewed, in a study published online May 20 in the preprint journal arXiv.
Producing original videos such as these, known as deepfakes, isn’t easy. Human heads are geometrically complex and highly dynamic; 3D models of heads have “tens of millions of parameters,” the study authors wrote.
What’s more, the human vision system is very good at identifying “even minor mistakes” in 3D-modeled human heads, according to the study. Seeing something that looks almost human — but not quite — triggers a sensation of profound unease known as the uncanny valley effect.
AI has previously demonstrated that producing convincing deepfakes is possible, but it required multiple angles of the desired subject. For the new study, the engineers introduced the AI to a very large dataset of reference videos showing human faces in action. The scientists established facial landmarks that would apply to any face, to teach the neural network how faces behave in general.
Then, they trained the AI to use the reference expressions to map the movement of the source’s features. This enabled the AI to create a deepfake even when it had just one image to work from, the researchers reported.
And more source images delivered an even more detailed result in the final animation. Videos created from 32 images, rather than just one, achieved “perfect realism” in a user study, the scientists wrote.