Convolutional Neural Networks to Power Intelligent Cameras
University of Bristol and University of Manchester have found ways to improve AI cameras.
Today, security cameras use artificial intelligence and neural network algorithms to recognize different objects in the image, in real-time accurately. Some of these cameras can not only recognize movement but also determine what caused the motion alarm. And now, the Universities of Bristol and Manchester have developed cameras that can learn and understand what they are seeing. Both the universities presented two papers explaining how sensing and learning can be combined to create novel cameras for AI systems.
Walterio Mayol-Cuevas, Professor in Robotics, Computer Vision and Mobile Systems at the University of Bristol and the principal investigator (PI), elaborated that to create efficient perceptual systems, we need to push the boundaries beyond the ways we have been following so far. He says that roboticists and artificial intelligence (AI) researchers are aware of the problem associated with how current systems sense and process the world. At present, they are still combining sensors, like digital cameras that are designed for recording images, with computing devices like graphics processing units (GPUs) designed to accelerate graphics for video games.
This implies the AI systems perceive the world only after recording and transmitting visual information between sensors and processors. However, when collecting data, we often end up collecting irrelevant noise too. As a result, the data captured by sensors in meticulous detail ends up clogging the system with unnecessary information, consumes power, and adds up to extra processing time. Therefore a different approach is necessary to enable an efficient vision for intelligent machines.
The papers, one led by Dr. Laurie Bose and the other by Yanan Liu at Bristol, have revealed two refinements towards this goal: implementing Convolutional Neural Networks (CNNs), a form of AI algorithm for enabling visual understanding, directly on the image plane. The CNNs, the team, has developed can classify frames at thousands of times per second, without ever recording these images or sending them down the processing pipeline. The researchers considered demonstrations of classifying handwritten numbers, hand gestures, and even classifying plankton.
The research suggests a future with intelligent, dedicated AI cameras—visual systems that can simply send high-level information to the rest of the system, such as the type of object or event taking place in front of the camera. This approach would make systems far more efficient and secure as no images need to be recorded.
This concept was made possible by the SCAMP architecture developed by Piotr Dudek, Professor of Circuits and Systems and Principal Investigator at the University of Manchester, and his team. The SCAMP is a camera processor chip that the team describes as a Pixel Processor Array (PPA). A PPA has a processor built into each pixel that can communicate with each other for processing in a truly parallel form. This is ideal for CNNs and vision algorithms.
Professor Dudek mentions that the integration of pixel-level detection, processing, and memory enables high-performance, low-latency systems, and promises highly efficient hardware with low power consumption. One exciting feature of these cameras is that they have newly emerging machine learning capability and high speed to operate, and a lightweight configuration. This makes it ideal for highly maneuverable, high-speed work platforms that can literally learn on the fly, says Tom Richardson, Senior Lecturer in Flight Mechanics, at the University of Bristol.