Google AI Blog - RSS Feed

The latest news from Google AI.

Latest articles

Announcing WIT: A Wikipedia-Based Image-Text Dataset

Posted by Krishna Srinivasan, Software Engineer and Karthik Raman, Research Scientist, Google Research Multimodal visio-linguistic models rely on rich datasets in order to model the relationship between images and text. Traditionally, these datasets have been created by either manually captioning images, or crawling the web and extracting the alt-text...

Toward Fast and Accurate Neural Networks for Image Recognition

Posted by Mingxing Tan and Zihang Dai, Research Scientists, Google Research As neural network models and training data size grow, training efficiency is becoming an important focus for deep learning. For example, GPT-3 demonstrates remarkable capability in few-shot learning, but it requires weeks of training with thousands of GPUs, making it difficult...

Revisiting Mask-Head Architectures for Novel Class Instance Segmentation

Posted by Vighnesh Birodkar, Research Software Engineer and Jonathan Huang, Research Scientist, Google Research Instance segmentation is the task of grouping pixels in an image into instances of individual things, and identifying those things with a class label (countable objects such as people, animals, cars, etc., and assigning unique identifiers...

Music Conditioned 3D Dance Generation with AIST++

Posted by Shan Yang, Software Engineer and Angjoo Kanazawa, Research Scientist, Google Research Dancing is a universal language found in nearly all cultures, and is an outlet many people use to express themselves on contemporary media platforms today. The ability to dance by composing movement patterns that align to music beats is a fundamental aspect...

Personalized ASR Models from a Large and Diverse Disordered Speech Dataset

Posted by Katrin Tomanek, Software Engineer and Bob MacDonald, Technical Program Manager, Google Research Speech impairments affect millions of people, with underlying causes ranging from neurological or genetic conditions to physical impairment, brain damage or hearing loss. Similarly, the resulting speech patterns are diverse, including stuttering,...

Discovering Anomalous Data with Self-Supervised Learning

Posted by Chun-Liang Li and Kihyuk Sohn, Research Scientists, Google Cloud Anomaly detection (sometimes called outlier detection or out-of-distribution detection) is one of the most common machine learning applications across many domains, from defect detection in manufacturing to fraudulent transaction detection in finance. It is most often used when...

Detecting Abnormal Chest X-rays using Deep Learning

Posted by Zaid Nabulsi, Software Engineer and Po-Hsuan Cameron Chen, Software Engineer, Google Health The adoption of machine learning (ML) for medical imaging applications presents an exciting opportunity to improve the availability, latency, accuracy, and consistency of chest X-ray (CXR) image interpretation. Indeed, a plethora of algorithms have...

Introducing Omnimattes: A New Approach to Matte Generation using Layered Neural Rendering

Posted by Forrester Cole, Software Engineer and Tali Dekel, Research Scientist Image and video editing operations often rely on accurate mattes — images that define a separation between foreground and background. While recent computer vision techniques can produce high-quality mattes for natural images and videos, allowing real-world applications such...

Recreating Natural Voices for People with Speech Impairments

Posted by Ye Jia, Software Engineer and Julie Cattiau, Product Manager, Google Research Update — 2021/09/07: Added an additional sound clip used to train the model. On June 2nd, 2021, Major League Baseball in the United States celebrated Lou Gehrig Day, commemorating both the day in 1925 that Lou Gehrig became the Yankees’ starting first baseman,...

SoundStream: An End-to-End Neural Audio Codec

Posted by Neil Zeghidour, Research Scientist and Marco Tagliasacchi, Staff Research Scientist, Google Research Audio codecs are used to efficiently compress audio to reduce either storage requirements or network bandwidth. Ideally, audio codecs should be transparent to the end user, so that the decoded audio is perceptually indistinguishable from the...

Discover, share and read the best on the web

Follow RSS Feeds, Blogs, Podcasts, Twitter searches, Facebook pages, even Email Newsletters! Get unfiltered news feeds or filter them to your liking.

Get Inoreader
Inoreader - Follow RSS Feeds, Blogs, Podcasts, Twitter searches, Facebook pages, even Email Newsletters!