
NVIDIA Enabled Bing to Efficiently Conduct Object Detection to Deliver Accurate Results
NVIDIA helped Bing enhance its visual search capabilities and deliver accurate search results.
Challenges
Visual search is seen as the next great search frontier, and Microsoft’s Bing has tapped the power of NVIDIA GPUs to make it a reality. At the same time, they’ve leveraged the NVIDIA® CUDA® profiling toolchain and cuDNN to make the system more cost-effective. But visual search at scale is no easy matter: instantly delivering pertinent results when users mouse over objects within photos requires massive computations by algorithms trained to classify, detect, and match the images within images.
Before this, however, it was a lengthy wait for what the users were looking for. When Bing introduced image-search capabilities that enabled users to draw boxes around sub-images or click on boxes of sub-images already detected by the platform; they could then use those images as the basis of a new search. Bing sought a solution that was fast enough to keep up with user expectations.
Initiatives
The Bing team transitioned their object detection platform from CPUs to Azure NV-series virtual machines running NVIDIA Tesla® M60 GPU accelerators. In doing so, Bing slashed their object-detection latency from 2.5 seconds on the CPU to 200 milliseconds. Further optimizations with NVIDIA cuDNN lowered that to 40 milliseconds, well under the threshold for an excellent user experience on most applications.
The payoff for the move to NVIDIA GPUs was instantaneous, with inference latency reduced immediately by 10X. But Bing’s engineers weren’t about to stop there. They incorporated the NVIDIA cuDNN GPU-accelerated deep learning library into their code and updated their driver mode from the Windows Display Driver Model to the Tesla Compute Cluster, dropping latency to 40 milliseconds for a total performance improvement of 60X. To detect more object categories on an image, they moved from a fast R-CNN two-stage process to a one-stage “single-shot detection” process. This sped up the feature 10X and enabled the detection of over 80 image categories. The Bing team also leveraged a filter triggering model and Microsoft’s ObjectStore key-value store to limit the amount of data they need to process and cache results for future use. This helped them save over 90 percent of their costs, making it more economically feasible to service the volume of requests they receive daily.
Results
On the development and deployment side, switching to NVIDIA GPUs has empowered the Bing team to be more agile and increase their rate of learning and innovation. With CPUs, it would take months to run updated models on the entire dataset of billions of images after every significant change. With GPUs, this process is now instantaneous, making it practical to update the models frequently and provide more features for Bing’s users. Real-time object detection and visual search are now possible, making Bing Visual Search a groundbreaking moment.