ELRA ELRA
  Home Catalogue
Language Resources
Bug reports
Send us your bug reports.
Search Catalogue
 
Use keywords to find the product you are looking for.
Advanced Search
Languages
Anglais Français
Informations
  • Purchase procedure & Conditions

  • Pricing & user licences

  • How to promote your resources ?

  • Contact Us
  • Catalogue of Language Resources

    ELRA releases free Language Resources. (last update: January 24, 2013)


    The ELRA Catalogue of Language Resources offers a repository of Language Resources (LRs) made available through ELRA.


    (See full-size image)

    An increasing number of LRs in the various fields of Human Language Technology (see image on the left-hand side) are distributed on behalf of ELRA via its operational body ELDA, thanks to the contribution of various players of the HLT community.

    Our aim is to provide Language Resources, by means of this repository, so as to prevent researchers and developers from investing efforts to rebuild resources which already exist as well as help them identify and access those resources.

    Other resources identified, but not available through ELRA, can be viewed in the Universal Catalogue.

    If you have any suggestions or comments, or need any further details about ELRA and its Catalogue of Language Resources, please refer to the contact us section.

    ELRA is a partner of OLAC (Open Language Archives Community). The catalogue can be viewed as an OLAC repository.

    New Resources
  • ELRA-S0371 : PortMedia French and Italian corpus
    This corpus contains 700 transcribed
    dialogues from about 140 French speakers
    and 604 transcribed dialogues from about
    150 Italian speakers (several dialogues
    per speaker). The method chosen for the
    corpus construction process is that of a
    ‘Wizard of Oz’ (WoZ) system. This
    consists of simulating a natural
    language man-machine dialogue. The
    scenario was built in the domain of
    touristic information and reservation. A
    manual transcription and semantic
    annotation of the corpus are provided
    with corresponding wave files.

  • ELRA-B0015 : MEDIA Evaluation Package
    PortMedia French and Italian corpus
    This corpus contains 700 transcribed
    dialogues from about 140 French speakers
    and 604 transcribed dialogues from about
    150 Italian speakers (several dialogues
    per speaker). The method chosen for the
    corpus construction process is that of a
    ‘Wizard of Oz’ (WoZ) system. This
    consists of simulating a natural
    language man-machine dialogue. The
    scenario was built in the domain of
    touristic information and reservation. A
    manual transcription and semantic
    annotation of the corpus are provided
    with corresponding wave files.

  • ELRA-S0366 : LECTRA (LECture TRAnscriptions in European Portuguese)
    This corpus is composed of the audio and
    the manual transcriptions from seven
    1-semester University courses in
    Portuguese. The corpus contains a total
    of 28 hours of audio speech that were
    manually transcribed by several trained
    annotators. The corpus is comprised of
    technical University lectures.

  • ELRA-S0367 : CORAL Corpus
    The CORAL Corpus is a collection of
    spoken dialogues in European Portuguese.
    It consists of 56 dialogues about a
    predetermined subject: maps. One of the
    participants (giver) has a map with some
    landmarks and a route drawn between
    them; the other (follower) has also
    landmarks, but no route and consequently
    must reconstruct it. Only orthographic
    transcription was done for the whole
    corpus. A pilot recording was annotated
    in several levels.

  • ELRA-S0370 : MoveOn Speech and Noise Corpus
    The MoveOn Speech and Noise Corpus is a
    corpus recorded under the extreme
    conditions of the motorcycle environment
    within the MoveOn project. The speech
    utterances are in British English
    approaching the issue of command and
    control and template driven dialog
    systems with a focus on – but not
    limited to - the police domain. The
    major part of the corpus comprises noisy
    speech and environmental noise recorded
    on a motorcycle. Several clean speech
    recording sessions with the same
    recording setup (including the
    motorcycle helmet) in an office
    environment complete the corpus.

  • (last update: July 2014)

    Copyright © 2008 ELRA
    ELRACatalogue 0.8.0