Custom paired voice recording dataset built exactly to your project requirements, with or without video.

Product Overview

All data is gathered via our own platform, meaning no 3rd party software or platforms are used.

Our dialogue datasets are tailored for companies seeking to enhance their speech and language models with natural, human conversation. Each recording consists of a two-speaker dialogue, delivered as separate 48 kHz WAV and 1080p MKV files for each speaker. This setup facilitates easy integration into automatic speech recognition (ASR) training, speaker separation, and conversational modeling.


Tech specs

- 16 (or 24bit) 48kHz PCM stream (WAV)
- Recorder gear (microphones) capable of >16kHz nyquist (effective rate)
- "Facecam" video of the recorder(s), 1080p, 8000kbps, 100% sync with audio
- 100% customizable (e.g. if you need different video angles, or just audio without video, no problem)

Delivery

For each conversation session, we provide:

- Audio file 16/24bit 48kHZ WAV per speaker
- 100% customizable (e.g. if you prefer FLAC files over WAV, no problem)
- On request, can be splitted into 5-10 sec chunks

- Video file 8000kbps Matroska (H264) 1080p (1920x1080) per speaker
- With or without the 48kHz PCM audio combined with the video
- 100% customizable (e.g. if you prefer MP4 (H264) + AAC audio, no problem)
- On request, can be splitted into 5-10 sec chunks (to match the splitted audio chunks)

All delivery formats can be customized to fit your preferred storage structure and ingestion workflow.

Dataset Tailored By You

You set every requirement:

- Type of speech
- Free spontaneous conversation
- Free conversation under given topic
- Call center roleplay (e.g. customer and agent)
- Work roleplay (e.g, doctors appointment, or financial meeting)
- Whatever type / topics you need, we can deliver everything.

- Metadata requirements
- Fileformats and data types
- Total dataset hours
- Per pair restrictions (e.g. one recorder can record max 15h)
- Whatever ruleset you want


Built for Model Training

This dataset is suitable for a variety of applications, including:

- Automatic speech recognition
- Conversational AI
- Voice assistants
- Chatbots
- Speaker diarization
- Speech-to-text systems
- Text-to-speech evaluation
- Speech-to-speech systems
- Emotion and intent recognition
- Accent and dialect adaptation
- Domain-specific model fine-tuning
- Call center AI and agent simulation


Ethics and Compliance

Data for your custom project will be recorded under strict informed consent with guaranteed fair-pay compensation, fully GDPR-aligned processing, and transparent data governance. We enforce an absolute zero-tolerance policy toward scraping and synthetic speech; every asset delivered will be 100% authentic human speech. This ensures your finalized custom dataset will serve as a legally compliant, high-fidelity foundation perfectly engineered for your voice agents, real-time conversation models, intent classification, and global language understanding architectures.


Why Spire Light

Spire Light specializes in custom speech data collection for AI model training. Our platform is designed to manage contributors, ensure high-quality speech recordings, and maintain rigorous quality checks. Whether you require a small pilot project or a large-scale collection, we are equipped to assist you efficiently.