Home Catalogue
Language Resources
Bug reports
Send us your bug reports.
Search Catalogue
Use keywords to find the product you are looking for.
Advanced Search
Anglais Français
  • Purchase procedure & Conditions

  • Pricing & user licences

  • How to promote your resources ?

  • Contact Us
  • Catalogue of Language Resources

    ELRA releases free Language Resources. (last update: January 24, 2013)

    The ELRA Catalogue of Language Resources offers a repository of Language Resources (LRs) made available through ELRA.

    (See full-size image)

    An increasing number of LRs in the various fields of Human Language Technology (see image on the left-hand side) are distributed on behalf of ELRA via its operational body ELDA, thanks to the contribution of various players of the HLT community.

    Our aim is to provide Language Resources, by means of this repository, so as to prevent researchers and developers from investing efforts to rebuild resources which already exist as well as help them identify and access those resources.

    Other resources identified, but not available through ELRA, can be viewed in the Universal Catalogue.

    If you have any suggestions or comments, or need any further details about ELRA and its Catalogue of Language Resources, please refer to the contact us section.

    ELRA is a partner of OLAC (Open Language Archives Community). The catalogue can be viewed as an OLAC repository.

    New Resources
  • ELRA-E0042 : CLEFeHealth 2013 Evaluation Package
    The CLEFeHealth 2013 Task 3 Evaluation
    Package contains data used for the
    User-centred health information
    retrieval Shared task at the CLEFeHealth
    Lab conducted in 2013. Task 3 aimed at
    evaluating information retrieval to
    address questions patients may have when
    reading clinical reports.

  • ELRA-W0076 : Nepali Monolingual written corpus
    The Nepali Monolingual written corpus
    comprises the core corpus (core sample)
    and the general corpus. The core sample
    (CS) represents the collection of Nepali
    written texts from 15 different genres
    with 2000 words each published between
    1990 and 1992. It is based on FLOB/FROWN
    corpora and contains 802,000 words. The
    general corpus (GC) consists of written
    texts collected opportunistically from a
    wide range of sources such as the
    internet webs, newspapers, books,
    publishers and authors. It contains
    1,400,000 words.

  • ELRA-W0077 : English-Nepali Parallel Corpus
    This corpus consists of a collection of
    national development texts in English
    and Nepali. A small set of data is
    aligned at the sentence level (27,060
    English words; 21,756 Nepali words), and
    a larger set of texts at the document
    level (617,340 English words; 596,571
    Nepali words). An additional set of
    monolingual data in Nepali is also
    provided (386,879 words in Nepali).

  • ELRA-S0365 : aGender
    aGender contains speech sample
    recordings over public telephone lines
    with read and (semi-)spontaneous speech.
    Native German speakers called a voice
    portal from their private phone, and
    read text + answered some open
    questions. The corpus contains the
    voices of 945 German speakers (approx.
    minimum of 100 speakers per class), each
    delivering 18 speech items in up to six
    different sessions.

  • ELRA-W0074 : Amharic-English bilingual corpus
    The Amharic-English bilingual corpus
    contains parallel text from legal and
    news domains in Amharic script, in
    transliterated form and in English. The
    size of the corpus is of 232,653 words
    in Amharic and 291,701 in English.

  • (last update: April 2014)

    Copyright © 2008 ELRA
    ELRACatalogue 0.8.0