Disney’s Humanoid Robot Emulates Human Eye-movement
Researchers have designed a humanoid robot that can blink using animatronics.
Over the past few years, constant efforts have been made to make robots human-like. Researchers and scientists have made humanoid robots that can empathize with humans, emotionally console them and defend themselves when abused. But establishing familiarity of physical appearance between the humans and robots was still lacking. When we look at the robot, we can see a 3D face with no facial movement. This is the key feature that has distinguished humans from robots so far. However, engineers and researchers at Disney have designed a robot that can emulate the human eye movement. This gives the illusion that the robot is staring directly into the human’s eyes.
The Blinking Robot
The researchers of Disney labs have presented the paper titled “Realistic and Interactive Robot Gaze” at the International Conference on Intelligent Robots and Systems (ISRO). The paper cites that the development of the gaze system in the robot is carried out using the humanoid Audio-Animatronics bust. Animatronics combines robotics with audio and visual elements to create a life-like character.
The paper also mentions that people can be directly engaged and interacted to produce deeper immersion in storytelling through animatronics. Disney has been using animatronics in its theme park to create repeatable shows and storytelling involving animal or human characters, providing consistent entertainment for guests.
The current system combines the technical framework of the robot gaze with animation to create an illusion of life. Moreover, the current design addresses the challenges that have been faced by researchers previously such as the absence of self-motion and unstable camera inputs. The current design takes inspiration from the previous research work in the eye-movements of robots, which includes attention mechanism relying on the face, color, and motion detection, with motivation and emotions of the robot in performing a task. Another approach by Zaraki et al. about the social gaze-control system is also implemented into the current design. The paper cites that the existing system has focused on growing deeper than creating instances of behaviors that can portray aspects of a character, such as personality. The researchers have used handcrafted animation due to concerns in the adaptability of output in a deep-learning and data-driven model. In this animation, an animator controls a character’s expression, supported by an architecture that enables dynamic adaptation for interactions.
Structure of the Robot
The paper describes the robot platform as a custom Walt Disney Imagineering Audio-Animatronics bust with a head and upper torso, which has nine degrees-of-freedom (DOF), namely in the neck, eyes, eyelids, and eyebrows, and the robot is controlled using a custom proprietary software.
Moreover, a camera examines the robot’s field-of-view (FOV), which is 105◦ and 58◦, respectively, at an approximate range of 0.3 to 10 meters. The camera with the sensor is mounted to the robot’s upper torso with a skeleton fitting. A skeleton fitting is a three-step process for articulating the structure of a robot using a perception engine. A perception engine is an architecture that constructs representational physical objects and is powered by computational perception.
The current system deploys the perception engine to fit the skeletons. Each skeleton consists of the points of interest, which are eyes, nose, ears, neck, shoulders, elbows, wrist, hips, knees, ankles, big toes, small toes, and heels. Points-of-interest is then transformed from the camera frame to the robot’s frame of reference.
The researchers formulated the current experiment into a show where the robot character plays an older man reading a book with hearing difficulty and declined eyesight. The robot was constantly distracted from reading by people passing by or coming up to greet him. He glanced at people moving quickly in the distance, but as people encroach into his personal space, he stared with disapproval for the interruption or provides looks that are familiar to him with friendly acknowledgment. Through this approach, the researchers were able to test behaviors such as glancing and mutual gaze in a realistic scenario with limited FOV of the camera and system latency.
The architecture of the robot
Disney’s robot gaze interaction architecture included three components: the attention engine, behavior selection engine, and behavior library, which have bi-directional communication with one another and drive the animation in the robot character.
The attention engine identifies lower-level salient stimuli in the environment and generates a ‘curiosity score.’ The attention engine uses the camera data and the incoming data to estimate certain actions, such as waving and calculating the people’s movement. The behavior selection engine is the component of the robot character system representing higher-level reasoning for the character. This component contains a handcrafted state machine that directs the robot’s behavioral state while maintaining the information associated with various parameters such as current state, curiosity thresholds, and state timeout durations. The third component, behavior layering, is a system proposed by Rodney Brooks that segments different ‘levels of competence,’ where lower levels represent more basic functions of the robot, and. higher levels represent behaviors that require advanced processing. Moreover, higher levels can ‘subsume’ lower levels, meaning they can integrate, modify, suppress, or even completely override lower levels.
In the future the researchers of Disney Labs are hopeful to further explore attention engine parameters for reducing the dimensionality of the control palette, making it easier for animators to select desired character attentiveness and habituation.