Case Study

Speech Data Collection for a South Korean Language Services Provider (LSP)

One of our clients, a translation & localization company in South Korea needed to deliver a large volume of speech data within a very short deadline so they needed vendors who had the capacity to complete the project.

The project required 200 sentences recorded by 38,000 native speakers in 5 languages. The deadline for 2 languages was within 2 weeks. Added to this, they had a budget constraint so needed a solutions provider that could provide end-to-end services within their budget. This was one of their primary challenges as very few speech data collection service providers have the tools and resources to deliver within the timeframe at the scale they wanted.

With Audio Bee, the LSP found a fully managed solution that provided them the tools and workforce to collect the data they needed.

The Challenges

Lack of Speech Collection and Review Tools

Although many LSPs and speech data service providers can deliver a workforce solution to an extent, only a handful have the tools to collect speech data from qualified native participants, have a robust process to review them, communicate feedback to the participants about the status of their recordings and finally a flexible method of exporting approved data. The client also did not have the right tools necessary to review the collected data once they were delivered and were using Google sheets to provide feedback on each audio.

Special Recording Requirements

The client required the recordings to be done in a driven car environment. The special recording environment was needed since the data collection was being done for training speech recognition technologies in the automotive industry.

Scale, Budget & Time Constraint

The speech data project required a significant volume of data within a tight budget and schedule. With neither the tools nor the workforce, the client needed to find capable vendors that could handle the scale of the project.

The Solution

Provided voice recording and review tools

High-level quality checks for native speakers and car environment

Improved workflow by syncing our speech platform to their Google Drive

The Results

Audio Bee’s tools and workforce was well suited to meet the client’s requirements and provide them with an end-to-end solution for their speech data needs.

Quick Learning for Improved Work Process

Audio Bee’s trained resources were able to quickly capture the special environment requirement and improve quality control measures as required by the client to improve their work process. Not only were our trained workforce also able to identify true native speakers but were able to confidently point out fake car environments to ensure the right output.

Our speech data collection tool also helped bridge any communication gaps that would have been caused if we had used email as a primary medium of communicating. By avoiding long threads of messages, we were able to significantly save time and efforts.

The customized work process also helped minimize errors and deliver output that the client was happy with well within the required timeframe.

High-quality Data Collection, Review & Output Generation

With continuous feedback and optimization processes, the client provided us feedback on the few recordings that we submitted. Based on the feedback, we understood what their quality thresholds were and what else to watch out for thus resulting in high quality output at a large scale.

With Audio Bee, the client was able to not only generate the required speech data, but also easily sync our defined work process with their preferred cloud data storage. By doing this, the client received a continuous flow of output digitally removing any physical delays in the process.

Sign up to Audio Bee today!


Contact Us


Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Copyright © 2021 Audio Bee, Inc. All rights reserved.