Introducing hertz-dev, the first open-source base model for conversational audio generation

Github | Download checkpoints | Twitter

For the last few months, we at Standard Intelligence have been researching scalable cross-modality learning. We're excited to announce that we're open-sourcing current checkpoints of our full-duplex, audio-only transformer base model, hertz-dev, with a total of 8.5 billion parameters.

Hertz-dev is the first publicly released audio base model of its kind. Base models are uniquely valuable as a research product because they accurately model the distribution of the data that they were trained on, as opposed to models that have had substantial RL tuning done to collapse their generation distributions. This makes base models the best starting point to fine-tune for a large number of different tasks.

Hertz-dev has a theoretical latency of 65ms and a real-world average latency of 120ms on a RTX 4090. This is about 2x lower latency than any public model in the world—a prerequisite for a model that can interact with you in human-like ways instead of what feels like a delayed, choppy phone call. We're currently training a larger, more advanced version of Hertz, which will use a scaled base model recipe and RL tuning to substantially improve the raw capabilities and final coherence of the model. Hertz-dev is a glimpse at the future of real-time voice interaction, and is the easiest conversational audio model in the world for researchers to fine-tune and build on top of.

Sample Generations

To demonstrate the audio modeling capabilities of hertz-dev, we sample both one-channel and two-channel generations as well as a live conversation between the model and a human.

One-channel

Two-channel

Interactive

9 seconds of prompt included.

At SI, we're doing fundamental research with the goal of building aligned general intelligence, and we view this as just the first step on that journey. We're starting in a unique time where a tiny team can do massively outsized work.

We're currently a team of 4 in San Francisco. If your life goal is to build AGI in a way that benefits all humanity, we might want to hire you—reach out at [email protected]. If you're interested in investing, please reach out at [email protected].