Home Catalogue
Language Resources
Bug reports
Send us your bug reports.
Search Catalogue
Use keywords to find the product you are looking for.
Advanced Search
Anglais Français
  • Purchase procedure & Conditions

  • Pricing & user licences

  • How to promote your resources ?

  • Contact Us
  • Catalogue of Language Resources

    ELRA releases free Language Resources.

    The ELRA Catalogue of Language Resources offers a repository of Language Resources (LRs) made available through ELRA.

    (See full-size image)

    An increasing number of LRs in the various fields of Human Language Technology (see image on the left-hand side) are distributed on behalf of ELRA via its operational body ELDA, thanks to the contribution of various players of the HLT community.

    Our aim is to provide Language Resources, by means of this repository, so as to prevent researchers and developers from investing efforts to rebuild resources which already exist as well as help them identify and access those resources.

    Other resources identified, but not available through ELRA, can be viewed in the Universal Catalogue.

    If you have any suggestions or comments, or need any further details about ELRA and its Catalogue of Language Resources, please refer to the contact us section.

    ELRA is a partner of OLAC (Open Language Archives Community). The catalogue can be viewed as an OLAC repository.

    New Resources
  • ELRA-S0392 : Pashto phonetic lexicon
    This is a phonetic lexicon of 21,560
    words in Pashto transcribed manually by
    a native Pashto speaker (Yusufzai
    dialect) using the IPA Pashto phoneme

  • ELRA-S0391 : The FAME! Speech Corpus
    This Frisian corpus consists of 203
    audio segments of approximately 5
    minutes long extracted from various
    radio programs covering a time span of
    almost 50 years (1966-2015), adding a
    longitudinal dimension to the database.
    The content of the recordings are very
    diverse including radio programs about
    culture, history, literature, sports,
    nature, agriculture, politics, society
    and languages. There are 309 identified
    speakers in the FAME! Speech Corpus, 21
    of whom appear at least 3 times in the
    database. The total duration of the
    manually annotated radio broadcasts sums
    up to 18 hours, 33 minutes and 57

  • ELRA-W0117 : Danish Propbank
    The Danish Propbank (DPB) is an
    87,000-token treebank from a variety of
    genres, annotated with morphosyntactic
    and semantic information, namely
    propositions/frames with VerbNet classes
    and semantic roles for both arguments
    and satellites. There are over 12,000
    frames with 32,000 role instances. The
    corpus has also been annotated with 20
    Named Entity classes and a 200-category
    semantic ontology for nouns.

  • ELRA-S0388 : GlobalPhone Bulgarian Pronunciation Dictionary 260k entries (extended version)
    This extended version of the Bulgarian
    Pronunciation Dictionary called
    Bulgarian-Dict260k contains
    pronunciations of more than 260,000 word

  • ELRA-S0389 : Accented English GlobalPhone
    The Accented English part of the
    GlobalPhone resources contains 63
    recording sessions of Bulgarian,
    Chinese, German, and Indian native
    speakers reading 37 English sentences
    each, produced in GlobalPhone-style,
    i.e. 16kHz PCM encoded audio recordings
    of utterance-segmented read speech from
    the newspaper domain.

  • (last update: June 2017)

    Copyright © 2008 ELRA
    ELRACatalogue 0.8.0