Home Catalogue
Language Resources
Bug reports
Send us your bug reports.
Search Catalogue
Use keywords to find the product you are looking for.
Advanced Search
Anglais Français
  • Purchase procedure & Conditions

  • Pricing & user licences

  • How to promote your resources ?

  • Contact Us
  • Catalogue of Language Resources

    ELRA releases free Language Resources. (last update: January 24, 2013)

    The ELRA Catalogue of Language Resources offers a repository of Language Resources (LRs) made available through ELRA.

    (See full-size image)

    An increasing number of LRs in the various fields of Human Language Technology (see image on the left-hand side) are distributed on behalf of ELRA via its operational body ELDA, thanks to the contribution of various players of the HLT community.

    Our aim is to provide Language Resources, by means of this repository, so as to prevent researchers and developers from investing efforts to rebuild resources which already exist as well as help them identify and access those resources.

    Other resources identified, but not available through ELRA, can be viewed in the Universal Catalogue.

    If you have any suggestions or comments, or need any further details about ELRA and its Catalogue of Language Resources, please refer to the contact us section.

    ELRA is a partner of OLAC (Open Language Archives Community). The catalogue can be viewed as an OLAC repository.

    New Resources
  • ELRA-E0043 : CLEFeHealth 2014 Task 3 Evaluation Package
    The CLEFeHealth 2014 Task 3 Evaluation
    Package contains data used for the
    User-centred health information
    retrieval Shared task at the CLEFeHealth
    Lab conducted in 2014. Task 3 aimed at
    evaluating information retrieval to
    address questions patients may have when
    reading clinical reports.

  • ELRA-W0081 : Khresmoi manually annotated reference corpus
    This corpus is a collection of Khresmoi
    English web documents annotated with key
    entities (such as disease, drug). The
    corpus is divided into two parts: 1.
    The initial corpus: 625 documents from
    the Genetics Home Reference data set,
    automatically annotated with anatomical
    locations and diseases, and manually
    corrected by 3-4 annotators. Size of
    documents: between 26 and 8,306 tokens
    each. 2. The main corpus: 6,950
    English documents from the Khresmoi
    crawl and 5,518 English Wikipedia pages,
    automatically annotated through the GATE
    Platform for Anatomy, Disease, Drug and
    Investigation. Size of documents:
    between 200 and 2,000 tokens each. The
    corpus is using the GATE XML format.

  • ELRA-T0375 : ACL RD-TEC: A Reference Dataset for Terminology Extraction and Classification Research in Computational Linguistics
    This is a reference dataset for
    terminology extraction and
    classification research in computational
    linguistics. It is a set of manually
    annotated terms in English language that
    are extracted from the ACL Anthology
    Reference Corpus (ACL ARC). This
    dataset, called ACL RD-TEC, is comprised
    of more than 69,000 candidate terms that
    are manually annotated as valid and
    invalid terms. Furthermore, valid terms
    are classified as technology and
    non-technology terms.

  • ELRA-L0089 : Macedonian lexicon of toponyms (MACPLEX_TOPO)
    MACPLEX_TOPO lexicon contains 1,398
    lemmas and 40,246 word forms (787
    places, 428 regions, 68 waters, 47
    peoples, 45 mountains, 27 lands). New
    words related to toponyms (their
    inhabitants and related adjectives) are
    derived. The lexicon is available in

  • ELRA-B0016 : Macedonian Morphological Lexicon (MACPLEX)
    Macedonian lexicon of toponyms (MACPLEX_TOPO)
    Macedonian lexicon of proper nouns (MACPLEX_PROPERS)
    Macedonian lexicon of derived adjectives (MACPLEX_ADJDERV)
    Macedonian lexicon of participles (MACPLEX_ADJPARTIC)
    Macedonian lexicon of compound words (MACPLEX_COMP)
    This lexicon contains 784 lemmas and
    6,289 word forms (576 nouns, 25
    adjectives, 73 adverbs, 66
    interjections, 17 numerals, 15 pronouns
    and 12 residuals). The lexicon is
    available in Unicode.

  • (last update: January 2015)

    Copyright © 2008 ELRA
    ELRACatalogue 0.8.0