badge icon

This article was automatically translated from the original Turkish version.

Article

DolphinGemma

IMG_5345.jpeg
DolphinGemma
Publication Date
April 14, 2025
Website
https://blog.google/technology/ai/dolphingemma/

DolphinGemma is a artificial intelligence model developed by Google to analyze the vocalizations of marine mammals. The model is based on Google’s open-source large language model series named Gemma and is specifically configured to study the vocal communications of Atlantic spotted dolphins (Stenella frontalis). With approximately 400 million parameters, DolphinGemma is an “audio-in/audio-out” model capable of processing audio inputs and generating audio outputs, designed to decipher structural patterns within sequences of sounds.


Collaboration and Database

The model was developed in collaboration with the Wild Dolphin Project (WDP). WDP, which has been observing a specific population of Atlantic spotted dolphins in the Bahamas since 1985, has created a comprehensive data dataset containing individual dolphin life histories, behavior observations, and labeled underwater audio-video recordings. DolphinGemma was trained using this database. Dolphin vocalizations were converted into tokens via Google’s SoundStream audio encoder and fed into the model’s learning pipeline.

Model Functionality and Research Objectives

DolphinGemma aims to analyze the sequential patterns in natural dolphin vocalizations to identify recurring structures and motifs within sound sequences. The model operates similarly to large language models used for human language: it takes previous sounds as input and predicts the most likely subsequent sounds. This building enables researchers to investigate whether natural dolphin sounds contain meaningful patterns and whether a linguistic structure exists in their communication.


Left Image: A mother spotted dolphin observes her calf while foraging. When finished, she will use her unique signature whistle to call the calf back. Right Image: A spectrogram visualization of the whistle (Source: Google)

Integration with the CHAT System

One of the systems designed for field deployment of DolphinGemma is CHAT (Cetacean Hearing Augmentation Telemetry), developed in collaboration with the Georgia Institute of Technology. This system system does not aim to decode the complex natural communication of dolphins directly but instead seeks to create a simpler, shared word vocabulary. CHAT operates on the assumption that dolphins can learn to associate artificially generated whistles with specific objects and use them for communication.



The CHAT system detects imitation sounds from the user, identifies which sound was produced, provides feedback to the researcher, and prompts the researcher to present the corresponding object. This cycle reinforces the association between sound and object. The initial version of the system used Google Pixel 6 devices; a next-generation version running on the Pixel 9 model will be deployed in the 2025 write season. In this new system, both deep learning models and template matching algorithms can operate simultaneously.

Technical Specifications and Applications

DolphinGemma’s architecture of approximately 400 million parameters is compatible with portable devices used in fieldwork. This reduces the need for specialized hardware and enhances system efficiency under ocean conditions. Although primarily trained on Atlantic spotted dolphin vocalizations, DolphinGemma will be shared as open-source software, allowing adaptation to vocalizations of other species such as bottlenose or spinner dolphins. The model’s flexible design enables researchers to retrain it using their own datasets.

Scientific Contributions and Potential Applications

The model is being used in scientific research to analyze the natural acoustic communication of marine mammals. Automating sound analysis processes previously conducted manually reduces research time and enables more systematic detection of patterns. Furthermore, outputs from DolphinGemma are integrated with the CHAT system to create a more interactive research environment. This allows patterns derived from natural sound sequences to be translated into simple interaction models.

Open Access and Future Perspectives

Google plans to release the DolphinGemma model as open-source to the research community in summer 2025. This release will facilitate access for institution scientists and academics conducting marine mammal research worldwide and encourage its application in studies of vocal communication across different species. This approach is expected to significantly increase international common efforts in research on marine mammal acoustic communication.

Author Information

Avatar
AuthorÖmer Said AydınDecember 6, 2025 at 7:40 AM

Tags

Discussions

No Discussion Added Yet

Start discussion for "DolphinGemma" article

View Discussions

Contents

  • Collaboration and Database

  • Model Functionality and Research Objectives

  • Integration with the CHAT System

  • Technical Specifications and Applications

  • Scientific Contributions and Potential Applications

  • Open Access and Future Perspectives

Ask to Küre