DolphinGemma is an artificial intelligence model developed by Google, designed to analyze the vocalizations of marine mammals. The model is based on Google's open-source Gemma large language model series and is specifically structured to study the vocal communication of Atlantic spotted dolphins (Stenella frontalis). With approximately 400 million parameters, DolphinGemma is an “audio-in/audio-out” model capable of processing audio inputs and producing audio outputs, aimed at analyzing structural patterns in audio sequences.
Collaborative Work and Database
The model was developed in collaboration with the Wild Dolphin Project (WDP). Observing a specific community of Atlantic spotted dolphins in the Bahamas since 1985, WDP has produced a comprehensive data set including life histories of individual dolphins, behavioral observations, and labeled underwater audio-video recordings. DolphinGemma was trained using this database. Dolphin sounds were converted into tokens using Google's developed SoundStream audio encoder and fed into the model's learning process.
Model Operation and Research Objectives
DolphinGemma aims to identify recurring patterns and structures in audio sequences by analyzing the sequence of natural dolphin vocalizations. The model works similarly to large language models; it predicts the sounds likely to follow while taking previous sounds as input. This structure allows researchers to investigate whether natural sounds contain meaningful patterns and whether there is a type of language structure in communication.
Left: A mother spotted dolphin observing her calf while foraging. She will use her unique whistle to recall her calf when she is finished. Right: A spectrogram visualizing the whistle (Source: Google)
Integration With CHAT System
One of the systems where DolphinGemma will be applied in the field is the underwater CHAT (Cetacean Hearing Augmentation Telemetry) system, developed in collaboration with the Georgia Institute of Technology. This system aims to create a simpler, shared vocabulary rather than directly deciphering the complex natural communication of dolphins. CHAT is based on the assumption that dolphins can communicate by imitating artificially produced sounds associated with specific objects.
The CHAT system detects imitated sounds from the user; identifies which sound was produced, informs the researcher, and enables the researcher to present the relevant object. This cycle allows for the reinforcement of the sound-object relationship. The system, initially using Google Pixel 6, will have a new version integrated into the field in the 2025 summer season, running on the Pixel 9 model. In this new system, both deep learning models and template matching algorithms can be run simultaneously.
Technical Specifications and Application Areas
DolphinGemma's structure of approximately 400 million parameters is compatible with portable devices for field work. This reduces the need for specialized hardware and increases the efficiency of the system in ocean applications. Although primarily trained on the sounds of Atlantic spotted dolphins, DolphinGemma will be shared as open source, making it adaptable to the vocalizations of different species such as bottlenose or spinner dolphins. The model's flexible structure allows researchers to retrain it with their own datasets.
Scientific Contributions and Potential Uses
The model is used in scientific research to analyze the natural vocal communication of marine mammals. Automating the sound analysis processes previously conducted solely through human effort shortens the research time and enables more systematic detection of patterns. Furthermore, DolphinGemma's outputs are integrated with the CHAT system to provide a more interactive research environment. In this way, patterns obtained from natural sound sequences can be transformed into simple interaction models.
Open Access and Future Perspectives
Google plans to share the DolphinGemma model as open source with the research community in the summer of 2025. This sharing will facilitate access to the model for institutions and academics conducting marine mammal research worldwide and encourage the use of the model in studies of the vocal communication of different species. This approach is expected to increase internationally collaborative work in vocal communication research related to marine mammals.