Spirelight | Custom Speech Data for Voice Products

spirelight · session 01 / 03

Live session · Iberian Spanish

Scripted monologues, dialect-tagged

Recording

MRMaria · Madrid · 32Done
JPJavier · Sevilla · 41Recording
ALAna · Bilbao · 27Queued

WAV · 48 kHz · stereo Prompt set 02 / 12

spirelight · transcript 02 / 03

en-IE_002_dialogue_03.json QA · 2 reviewers

00:00.42 S1 Could you walk me through the booking flow you used last Tuesday?
00:03.10 S2 Sure, I opened the app, tapped the search bar, then… flagged
00:06.94 S1 Got it. Any pauses or hesitations there?
00:09.38 S2 Yeah, [pause 1.2s] I had to scroll to find the right date.

Word-level timestamps · Speaker-aware · Diarized

manifest.json 03 / 03

{

"project": { 3 fields }, {

"id": "sl-9241",

"language": "es-ES",

"hours": 3000

"audio": { 3 fields }, {

"format": "wav",

"sample_rate": 48000,

"channels": 2

"transcripts": { click to expand }, {

"format": "jsonl",

"timestamps": "word",

"diarized": true

"delivery": { click to expand } {

"channel": "s3-bucket",

"checksums": "sha256",

"batches": true

}

Tell us the speech data you need

Send over your speaker profiles, language needs, and background noise conditions. Our team will design a custom recording plan and deliver a complete project workflow within two business days.

Speech datasets tailored to your model requirements

Recruit the right speakers and collect recordings in one controlled flow.

Track collected hours, QA status, and delivery batches while the project runs.

Speech datasets tailored to your requirements

Scripted monologues, dialect-tagged

Speech collection

Transcription and annotation

Dataset delivery

Collect the speech data your model is missing

Recruit speakers by language, dialect, and profile.

Controlled recording setups when quality matters.

Review early batches and adjust the project as it runs.

Strong coverage in Nordic and harder-to-source European languages.

Speech data for common voice AI use cases

Wake words and voice commands.

Multilingual ASR and TTS expansion.

Voice agents and emotion-aware AI.

The team running your data collection project

Andreas Kromann

Emil Thorsson

Gustav Aggeboe

Joyi Ulfat

Mateo Thelen

Pekka Larjovuori

Victor Melchior

Yusif Aliyev

Michael J. Jørgensen

Understand speech data before you buy it

What is speech data?

How to buy AI training data

How much speech data do you need?

ASR training data

What is data annotation?

Multilingual speech data

TTS training data

Emotional speech data

What is audio annotation?

Tell us the speech data you need

Speech datasets tailored to your model requirements

Recruit the right speakers and collect recordings in one controlled flow.

Track collected hours, QA status, and delivery batches while the project runs.

Scripted monologues, dialect-tagged

Speech collection

Transcription and annotation

Dataset delivery

Recruit speakers by language, dialect, and profile.

Controlled recording setups when quality matters.

Review early batches and adjust the project as it runs.

Strong coverage in Nordic and harder-to-source European languages.

Wake words and voice commands.

Multilingual ASR and TTS expansion.

Voice agents and emotion-aware AI.

The team running your data collection project

Andreas Kromann

Emil Thorsson

Gustav Aggeboe

Joyi Ulfat

Mateo Thelen

Pekka Larjovuori

Victor Melchior

Yusif Aliyev

Michael J. Jørgensen

Understand speech data before you buy it

What is speech data?

How to buy AI training data

How much speech data do you need?

ASR training data

What is data annotation?

Multilingual speech data

TTS training data

Emotional speech data

What is audio annotation?

Tell us the speech data you need |

Tell us the speech data you need