Skip to content

A Google DeepMind AI language model is now making descriptions for YouTube Shorts

Google's DeepMind utilizes its Flamingo AI to provide descriptions for YouTube Shorts, improving video categorization and search results.

DeepMind's Flamingo AI Creates Descriptions for YouTube Shorts

Flamingo, a visual language model (VLM) developed by Google DeepMind, is now being deployed to create descriptions for YouTube Shorts, according to a recent post from the new Google DeepMind. This development follows the merging of DeepMind and Google Brain into a single AI team.

YouTube Shorts are quick, often title-less and description-less, making them difficult to discover through search. Flamingo addresses this by analyzing the initial frames of a video to provide an explanatory description, such as "a dog balancing a stack of crackers on its head." These text descriptions, stored as metadata, aim to improve video categorization and match search results with viewer queries.

Colin Murdoch, Google DeepMind’s chief business officer, said this tool solves a real problem as creators often skip metadata for Shorts due to the streamlined creation process. Moreover, Todd Sherman, the director of product management for Shorts, added that because Shorts are mostly viewed on a feed where people swipe through videos, there's little incentive to add metadata.

However, Flamingo's generated descriptions won't be visible to users. According to Sherman, this metadata is behind the scenes and not presented to creators, but efforts are being made to ensure its accuracy. Any generated description will comply with Google's responsibility standards to prevent any misrepresentation of videos.

The Flamingo model is already at work, providing auto-generated descriptions for new Shorts uploads, and for a large number of existing videos, including the most viewed ones, says DeepMind spokesperson Duncan Smith.

When asked about Flamingo's potential application to longer-form YouTube videos, Sherman believes it's conceivable, but the need is less critical. Creators of longer-form videos usually invest considerable time in pre-production, filming, and editing, making metadata a small part of the process. Plus, these videos are often watched based on title and thumbnail, incentivizing creators to add discoverability-boosting metadata.

As Google continues to infuse AI into its services, applying tools like Flamingo to longer-form YouTube videos might be a future possibility, potentially revolutionizing YouTube search.