MoveOn Speech and Noise Corpus – ELRA Catalogue

Last view: 2025-07-04

18 Last view: 2025-07-04

MoveOn Speech and Noise Corpus

View resource name in all available languages

Corpus de parole et de bruit MoveOn

ISLRN: 791-273-247-357-8

ID:

ELRA-S0370

The MoveOn Speech and Noise Corpus is a corpus recorded under the extreme conditions of the motorcycle environment within the MoveOn project. The speech utterances are in British English approaching the issue of command and control and template driven dialog systems with a focus on – but not limited to - the police domain. The major part of the corpus comprises noisy speech and environmental noise recorded on a motorcycle. Several clean speech recording sessions with the same recording setup (including the motorcycle helmet) in an office environment complete the corpus. The corpus development focused on distortion free recordings and accurate descriptions of both recorded speech and noise.

In addition to an orthographical transcription of the speech segments, annotations of the background noise for both speech and pure noise segments are available.

The corpus is a small-sized speech corpus with up to 6 hours of clean and noisy speech utterances per channel and about 30 hours of segments with environmental noise only (without any speech). Recordings were performed simultaneously for three microphone channels – two helmet close-talk microphones and one throat microphone.

View resource description in French

Le corpus de parole et de bruit MoveOn a été produit dans le cadre du projet MoveOn. Il s’agit d’un corpus enregistré dans des conditions extrêmes au sein d’un environnement sonore associé à la moto. Les énoncés sont en anglais britannique. Il s’agit de systèmes de dialogue types, de systèmes de commande et de contrôle orientés, entre autres, vers le domaine de la Police. Une grande partie du corpus contient de la parole en milieu bruité et du bruit de fond enregistrés depuis une moto. L’autre partie du corpus contient une version propre, enregistrée dans un environnement de bureau suivant les mêmes spécifications (incluant le casque du motard). Le développement de ce corpus se focalise sur des enregistrements non déformés et des descriptions précises de la parole et des bruits enregistrés.

Chaque segment de parole a été transcrit orthographiquement et les bruits de fond ont été annotés autant dans la partie parole du corpus que dans la partie bruit.

Il s’agit d’un corpus de petite taille contenant environ 6 heures par canal de parole propre et bruitée, ainsi que 30 heures environ de segments contenant uniquement des bruits de fonds (sans parole). Les enregistrements ont été réalisés en simultané sur trois canaux : deux micros « close-talk » pour casque et un micro au niveau de la gorge.

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	1000.00 €	7000.00 €
Licence: Commercial Use - ELRA VAR	7000.00 €	7000.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	1500.00 €	9000.00 €
Licence: Commercial Use - ELRA VAR	9000.00 €	9000.00 €

DistributionAvailability start date 11/07/2014 Contact Person

Valérie Mapelli

audio

Monolingual audio corpusLanguages

English

Linguality

Linguality type: Monolingual

Size

no size available

Size

6 Hours

Classification

Audio genre: Other

Audio FormatsRecording

Recording device type details: two helmet close-talk microphones and one throat microphone

Source channel: Other

Resource Creation

Funding Project

MoveOn

Funding Type: Other

Metadata

Created: 05/12/2005

Metadata Language: French, English (fr, en)

Version

Version: 1.0

Last Updated: 07/11/2014

Usage

Actual Use - Nlp Applications

Use specific to NLP: Speech Recognition

People who looked at this resource also viewed the following: