This article was automatically translated from the original Turkish version.
+2 More

DolphinGemma is a artificial intelligence model developed by Google to analyze the vocalizations of marine mammals. The model is based on Google’s open-source large language model series named Gemma and is specifically configured to study the vocal communications of Atlantic spotted dolphins (Stenella frontalis). With approximately 400 million parameters, DolphinGemma is an “audio-in/audio-out” model capable of processing audio inputs and generating audio outputs, designed to decipher structural patterns within sequences of sounds.
The model was developed in collaboration with the Wild Dolphin Project (WDP). WDP, which has been observing a specific population of Atlantic spotted dolphins in the Bahamas since 1985, has created a comprehensive data dataset containing individual dolphin life histories, behavior observations, and labeled underwater audio-video recordings. DolphinGemma was trained using this database. Dolphin vocalizations were converted into tokens via Google’s SoundStream audio encoder and fed into the model’s learning pipeline.
DolphinGemma aims to analyze the sequential patterns in natural dolphin vocalizations to identify recurring structures and motifs within sound sequences. The model operates similarly to large language models used for human language: it takes previous sounds as input and predicts the most likely subsequent sounds. This building enables researchers to investigate whether natural dolphin sounds contain meaningful patterns and whether a linguistic structure exists in their communication.

Left Image: A mother spotted dolphin observes her calf while foraging. When finished, she will use her unique signature whistle to call the calf back. Right Image: A spectrogram visualization of the whistle (Source: Google)
One of the systems designed for field deployment of DolphinGemma is CHAT (Cetacean Hearing Augmentation Telemetry), developed in collaboration with the Georgia Institute of Technology. This system system does not aim to decode the complex natural communication of dolphins directly but instead seeks to create a simpler, shared word vocabulary. CHAT operates on the assumption that dolphins can learn to associate artificially generated whistles with specific objects and use them for communication.
The CHAT system detects imitation sounds from the user, identifies which sound was produced, provides feedback to the researcher, and prompts the researcher to present the corresponding object. This cycle reinforces the association between sound and object. The initial version of the system used Google Pixel 6 devices; a next-generation version running on the Pixel 9 model will be deployed in the 2025 write season. In this new system, both deep learning models and template matching algorithms can operate simultaneously.
DolphinGemma’s architecture of approximately 400 million parameters is compatible with portable devices used in fieldwork. This reduces the need for specialized hardware and enhances system efficiency under ocean conditions. Although primarily trained on Atlantic spotted dolphin vocalizations, DolphinGemma will be shared as open-source software, allowing adaptation to vocalizations of other species such as bottlenose or spinner dolphins. The model’s flexible design enables researchers to retrain it using their own datasets.
The model is being used in scientific research to analyze the natural acoustic communication of marine mammals. Automating sound analysis processes previously conducted manually reduces research time and enables more systematic detection of patterns. Furthermore, outputs from DolphinGemma are integrated with the CHAT system to create a more interactive research environment. This allows patterns derived from natural sound sequences to be translated into simple interaction models.
Google plans to release the DolphinGemma model as open-source to the research community in summer 2025. This release will facilitate access for institution scientists and academics conducting marine mammal research worldwide and encourage its application in studies of vocal communication across different species. This approach is expected to significantly increase international common efforts in research on marine mammal acoustic communication.

Collaboration and Database
Model Functionality and Research Objectives
Integration with the CHAT System
Technical Specifications and Applications
Scientific Contributions and Potential Applications
Open Access and Future Perspectives