Home Catalogue
Language Resources
Bug reports
Send us your bug reports.
Search Catalogue
Use keywords to find the product you are looking for.
Advanced Search
Anglais Français
  • Purchase procedure & Conditions

  • Pricing & user licences

  • How to promote your resources ?

  • Contact Us
  • Catalogue of Language Resources

    ELRA releases free Language Resources.

    The ELRA Catalogue of Language Resources offers a repository of Language Resources (LRs) made available through ELRA.

    (See full-size image)

    An increasing number of LRs in the various fields of Human Language Technology (see image on the left-hand side) are distributed on behalf of ELRA via its operational body ELDA, thanks to the contribution of various players of the HLT community.

    Our aim is to provide Language Resources, by means of this repository, so as to prevent researchers and developers from investing efforts to rebuild resources which already exist as well as help them identify and access those resources.

    Other resources identified, but not available through ELRA, can be viewed in the Universal Catalogue.

    If you have any suggestions or comments, or need any further details about ELRA and its Catalogue of Language Resources, please refer to the contact us section.

    ELRA is a partner of OLAC (Open Language Archives Community). The catalogue can be viewed as an OLAC repository.

    New Resources
  • ELRA-S0374 : FoxPersonTracks: a Benchmark for Person Re-Identification from TV Broadcast Shows
    FoxPersonTracks is a person track
    dataset dedicated to person
    re-identification. The dataset is built
    from a set of real life TV shows
    broadcasted from BFMTV and LCP TV french
    channels, provided during REPERE
    challenge. It contains a total 4,604
    persontracks (short video sequences
    featuring an individual with no
    background) from 266 persons. The
    dataset also provides re-identification
    results using space-time histograms as a
    baseline, together with an evaluation
    tool in order to ease the comparison to
    other re- identification methods.

  • ELRA-S0381 : TRAD Pashto Broadcast News Speech Corpus
    This corpus contains 108 hours of
    broadcast news recordings transcribed,
    covering more than 1,000 speakers.
    Transcriptions are provided together
    with the audio files and include about
    46,000 segments and 1.1M words.

  • ELRA-W0092 : TRAD Pashto Monolingual text Corpus
    This is a monolingual text corpus in
    Pashto. The corpus contains about
    112,000,000 tokens collected from 46
    different blogs and websites.

  • ELRA-W0093 : TRAD Pashto-French Parallel corpus of transcribed Broadcast News Speech - Training data
    This corpus consists of the
    transcription of 106 hours of recordings
    in Pashto from the TRAD Pashto Broadcast
    News Speech Corpus (ELRA-S0381)
    translated into French. It contains
    about 832,000 source words and 747,000
    target words.

  • ELRA-W0094 : TRAD Pashto-French Parallel corpus of transcribed Broadcast News Speech - Test data
    This is a parallel corpus, which
    contains 10,000 Pashto words translated
    into French. The source texts come from
    3 broadcast news transcriptions of the
    TRAD Pashto Broadcast News Speech Corpus

  • (last update: June 2016)

    Copyright © 2008 ELRA
    ELRACatalogue 0.8.0