Companion API

Voice

Voice sessions are managed end-to-end. You start a session and your client connects — we handle everything else.

What We Manage

The session runs over WebRTC via LiveKit. Inside the session:

Voice activity detection — we detect when the user starts and stops speaking
Transcription — speech is converted to text using the configured STT provider
Inference — the companion generates a response, with full memory and personality context applied
Synthesis — the response is spoken back in the companion's voice

All of this happens within the session. Your client does not handle any of these steps — it connects to the room and the conversation begins.

Voice Character

Each companion has a configured voice. The voice is part of the companion's identity — consistent across sessions and across users.

We default to Cartesia for voice synthesis, which delivers expressive, natural-sounding speech. For latency-sensitive or high-volume deployments, Kotoro offers an ultra-low-latency alternative at significantly lower cost — frequently good enough for turn-based conversation at scale.

How to Start a Session

Create a session

Your backend creates a voice session for a companion and user via the API.
Get a token

Your backend requests a short-lived LiveKit access token for the user from the session token endpoint.
Connect the client

Your client connects to the LiveKit room using the standard LiveKit SDK and the token. No additional Spike SDK required on the client.
Conversation begins

The companion listens. The user speaks. The companion responds in its own voice.

See the API Reference for session creation and token endpoints.

Last modified on June 3, 2026

Chat & Memory Avatar

Companion API

Voice

Voice sessions are managed end-to-end. You start a session and your client connects — we handle everything else.

What We Manage

The session runs over WebRTC via LiveKit. Inside the session:

Voice activity detection — we detect when the user starts and stops speaking
Transcription — speech is converted to text using the configured STT provider
Inference — the companion generates a response, with full memory and personality context applied
Synthesis — the response is spoken back in the companion's voice

All of this happens within the session. Your client does not handle any of these steps — it connects to the room and the conversation begins.

Voice Character

Each companion has a configured voice. The voice is part of the companion's identity — consistent across sessions and across users.

How to Start a Session

Create a session

Your backend creates a voice session for a companion and user via the API.
Get a token

Your backend requests a short-lived LiveKit access token for the user from the session token endpoint.
Connect the client

Your client connects to the LiveKit room using the standard LiveKit SDK and the token. No additional Spike SDK required on the client.
Conversation begins

The companion listens. The user speaks. The companion responds in its own voice.

See the API Reference for session creation and token endpoints.

Last modified on June 3, 2026

Chat & Memory Avatar

What We Manage

Voice Character

How to Start a Session

Related APIs

What We Manage

Voice Character

How to Start a Session

Related APIs