blog

a collection of our writing

Introducing hertz-dev, the first open-source base model for conversational audio generation

For the last few months, the team at Standard Intelligence has been doing research in cross-modality learning. We’re excited to announce that we’re open-sourcing an early product of this research, an 8.5B, full-duplex, audio-only base model: hertz-dev. Audio modality is imperative to creating interactive agents that feel natural. Currently the two methods of utilizing audio with generative AI are either diffusion based methods or autoregressive methods. Though diffusion based audio models prove to be good at music generation and small samples, truly interactive audio generation needs to be autoregressive.