ELRA releases .
(last update: January 24, 2013)
The ELRA Catalogue of Language Resources offers a repository of Language Resources (LRs) made available through ELRA.
(See full-size image)
An increasing number of LRs in the various fields of Human Language Technology (see image on the left-hand side) are distributed on behalf of ELRA via its operational body ELDA, thanks to the contribution of various players of the HLT community.
Our aim is to provide Language Resources, by means of this repository, so as to prevent researchers and developers from investing efforts to rebuild resources which already exist as well as help them identify and access those resources.
Other resources identified, but not available through ELRA, can be viewed in the Universal Catalogue.
If you have any suggestions or comments, or need any further details about ELRA and its Catalogue of Language Resources, please refer to the contact us section.
|ELRA-E0043 : CLEFeHealth 2014 Task 3 Evaluation Package
The CLEFeHealth 2014 Task 3 Evaluation
Package contains data used for the
User-centred health information
retrieval Shared task at the CLEFeHealth
Lab conducted in 2014. Task 3 aimed at
evaluating information retrieval to
address questions patients may have when
reading clinical reports.
|ELRA-W0081 : Khresmoi manually annotated reference corpus
This corpus is a collection of Khresmoi
English web documents annotated with key
entities (such as disease, drug). The
corpus is divided into two parts:
The initial corpus: 625 documents from
the Genetics Home Reference data set,
automatically annotated with anatomical
locations and diseases, and manually
corrected by 3-4 annotators. Size of
documents: between 26 and 8,306 tokens
2. The main corpus: 6,950
English documents from the Khresmoi
crawl and 5,518 English Wikipedia pages,
automatically annotated through the GATE
Platform for Anatomy, Disease, Drug and
Investigation. Size of documents:
between 200 and 2,000 tokens each.
corpus is using the GATE XML format.
|ELRA-T0375 : ACL RD-TEC: A Reference Dataset for Terminology Extraction and Classification Research in Computational Linguistics
This is a reference dataset for
terminology extraction and
classification research in computational
linguistics. It is a set of manually
annotated terms in English language that
are extracted from the ACL Anthology
Reference Corpus (ACL ARC). This
dataset, called ACL RD-TEC, is comprised
of more than 69,000 candidate terms that
are manually annotated as valid and
invalid terms. Furthermore, valid terms
are classified as technology and
|ELRA-L0089 : Macedonian lexicon of toponyms (MACPLEX_TOPO)
MACPLEX_TOPO lexicon contains 1,398
lemmas and 40,246 word forms (787
places, 428 regions, 68 waters, 47
peoples, 45 mountains, 27 lands). New
words related to toponyms (their
inhabitants and related adjectives) are
derived. The lexicon is available in
|ELRA-B0016 : Macedonian Morphological Lexicon (MACPLEX)
Macedonian lexicon of toponyms (MACPLEX_TOPO)
Macedonian lexicon of proper nouns (MACPLEX_PROPERS)
Macedonian lexicon of derived adjectives (MACPLEX_ADJDERV)
Macedonian lexicon of participles (MACPLEX_ADJPARTIC)
Macedonian lexicon of compound words (MACPLEX_COMP)
This lexicon contains 784 lemmas and
6,289 word forms (576 nouns, 25
adjectives, 73 adverbs, 66
interjections, 17 numerals, 15 pronouns
and 12 residuals). The lexicon is
available in Unicode.
|(last update: January 2015)