Persona-1-Live

Keyframe Research Team

Live stream and interact with Persona-1.

Today, we’re excited to share Persona-1-Live, a real-time streaming variant of Persona-1 — capable of holding dynamic, human-feeling conversations with low latency.

Architecture Overview

Persona-1-Live enhances Persona-1’s flow-based image transformer with a next-token prediction scheme optimized for interactive video generation. Crucially, our implementation enables conversational interactivity without sacrificing Persona-1’s speed, expressive fidelity, or hallmark low cost.

Empirically, we’ve also found the new next-token prediction scheme exploits our relatively small training sets more effectively, resulting in broader and more nuanced expression diversity. We invite you to video chat with persona-1-live in our playground.

Limitations & Future Work

We’ve discovered that our current audio encoder performs best with longer windows — much longer than the new next-token prediction scheme likely needs. As a result, the typical delay between the end of a user’s turn and the persona’s response is roughly six seconds. Experiments with alternative audio encoders are showing promising results, and we expect to bring this delay down by roughly 50% in the near future.

Pricing & Availability

Persona-1-Live is currently in early access, with pricing comparable to Persona-1: just a few cents per minute, ~1/2 the cost of the cheapest competing models.

We’re excited to see what sorts of novel applications today’s live streaming upgrade to Persona-1 enables.

Continue reading

View all blogs

Research

Persona-1

Expressive talking heads, running faster than realtime.

Ready to build AI that feels human?

Keyframe enables anyone to bring AI to life with photoreal, conversational humans.

Talk to sales

Start building