Data Collection
Collecting data to train Automatic Speach Recognition (ASR) and Natural Language Processing (NLP) systems in Large Language Models (LLM's) requires dedication and teamwork. Our team of professional's work to collect RAW unfiltered data for the use in AI voice projects. The collection is currently being done with 6 different ethnic groups.
Participants
Vetted for accents and diversity within the geographical location.
Corpus
Conversational in content and style.
Recording
48kHz, 24bit, mono channel line-in direct recording.
Data
Standard minimum specification of 24bit 48kHz, RIFF WAV