Latest Posts

Stay in Touch With Us

For Advertising, media partnerships, sponsorship, associations, and alliances, please connect to us below


+91 40 230 552 15

540/6, 3rd Floor, Geetanjali Towers,
KPHB-6, Hyderabad 500072

Follow us on social

Gearing Robots with Ears, Not Just Eyes for Better Perception

  /  Robotics   /  Gearing Robots with Ears, Not Just Eyes for Better Perception
Robotics, Robot vision, Robot sound, Robot action, Robotic vision sound

Gearing Robots with Ears, Not Just Eyes for Better Perception

Robots can detect and classify accurately up to 76% of sounds

People are capable of using mixed senses. Vision, hearing, smell and feel the touch are some of the basic senses in a human body. But robots are extremely different from this. Robots so far are designed to use only one sense, ‘vision.’

Robots are already taking their place in a lot of human tasks, minimizing human labour. It is quite surprising to learn that they were functioning with only one sense. So the scientists and research organisations are looking at ways to enrich the robotic features by including more senses. The immediate thought that stuck their mind was to add sound.

Robots generally have cameras in the face and sometimes in the hands to observe the actions. They can be taught by adding the input of the action as well as making them observe the process of action. For example, if you want a robot to lift a bottle, you can either program it or make the robot look at someone who is lifting the bottle. The robot scans the hand movement and tries to imitate it.

Henceforth, the scientists are up to the stage of featuring robot with hearing power. A lot of preliminary works in other sectors indicate that sound could be useful, but it wasn’t clear how it would work on robotics earlier. The sound differentiations and robotic actions could help a robot find an object that is being handled. For example, a robot with hearing power could say which one is a screwdriver and a metal wrench just by the sound it makes.


A robot that detects and differentiates sounds

The robotics scientists at Carnegie Mellon University are the initial set of people who explored how sound would help robots better understands the world around them. They found that by adding the hearing data sets.

Dhiraj Gandhi and Abhinav Gupta, scientists at the Robotics Institute in Carnegie Mellon University presented their new findings in the virtual Robotics: Science and System Conference. Lerrel Pinto, another robotic scientist also worked with them in the finding. The project began last June.

The team brought three major contributions to the table,

  • Create the largest sound-action-vision robotics datasets
  • Demonstrate that the robot can perform fine-grained object recognition using only sound
  • Show that sound is indicative of action, both for post-interaction predictions and pre-interaction forward modelling.

At some point of research, the team found that in certain domains like forwarding model learning, sound provides more information than the visual content alone. A study published by the three researchers unraveled that sound helps a robot to differentiate between various objects. It also features in predicting the physical properties of new objects.

The study further unfolded that hearing helped robots determine what type of action caused a particular sound. Around 76% of the times, the robots detected the object and classified them successfully. ‘Tilt-Bot,’ a square tray attached to the arms of a sawyer robot is a robotic platform to store sounds. The scientists’ team made the largest available sound-action-vision dataset with 15,000 interactions on 60 objects using Tilt-Bot. Objects like toy blocks, hand tools, shoes, apples, tennis balls, etc was used during the input.

When the robotic tray is tilted, the object inside crushes with the walls of the tray and makes a sound. The audio of the sound is collected in rich four-channel audio information. Through the process, the data explores the synergies between sound and action in three key points.

  • Sound is indicative of fine-grained object classification. They can provide the difference between each object.
  • Sound contains information about the causal effects of an action. With the sound produced, the robots can predict what action was applied to the object.
  • Object representations derived from audio embedding are indicative of implicit physical properties.


Robots can predict the sound of unseen objects with the input of existing sound programs. This study will be useful for the advancement of robotics. The next step forward the scientists are looking at is making the robots work simultaneously on vision and hearing. For example, if the robot sees a cup and hears it, then it ought to be capable of lifting and moving the cup without spilling what is inside.

When the robotics industry is moving at a fast pace, the intrusion of vision-sound-action is a breakthrough that could take robotics to a higher level of innovation. If scientists work on such peculiar subjects, robots could soon become similar to humans.