ELRA releases .
The ELRA Catalogue of Language Resources offers a repository of Language Resources (LRs) made available through ELRA.
(See full-size image)
An increasing number of LRs in the various fields of Human Language Technology (see image on the left-hand side) are distributed on behalf of ELRA via its operational body ELDA, thanks to the contribution of various players of the HLT community.
Our aim is to provide Language Resources, by means of this repository, so as to prevent researchers and developers from investing efforts to rebuild resources which already exist as well as help them identify and access those resources.
Other resources identified, but not available through ELRA, can be viewed in the Universal Catalogue.
If you have any suggestions or comments, or need any further details about ELRA and its Catalogue of Language Resources, please refer to the contact us section.
|ELRA-S0391 : The FAME! Speech Corpus
This Frisian corpus consists of 203
audio segments of approximately 5
minutes long extracted from various
radio programs covering a time span of
almost 50 years (1966-2015), adding a
longitudinal dimension to the database.
The content of the recordings are very
diverse including radio programs about
culture, history, literature, sports,
nature, agriculture, politics, society
and languages. There are 309 identified
speakers in the FAME! Speech Corpus, 21
of whom appear at least 3 times in the
database. The total duration of the
manually annotated radio broadcasts sums
up to 18 hours, 33 minutes and 57
|ELRA-W0117 : Danish Propbank
The Danish Propbank (DPB) is an
87,000-token treebank from a variety of
genres, annotated with morphosyntactic
and semantic information, namely
propositions/frames with VerbNet classes
and semantic roles for both arguments
and satellites. There are over 12,000
frames with 32,000 role instances. The
corpus has also been annotated with 20
Named Entity classes and a 200-category
semantic ontology for nouns.
|ELRA-S0388 : GlobalPhone Bulgarian Pronunciation Dictionary 260k entries (extended version)
This extended version of the Bulgarian
Pronunciation Dictionary called
pronunciations of more than 260,000 word
|ELRA-S0389 : Accented English GlobalPhone
The Accented English part of the
GlobalPhone resources contains 63
recording sessions of Bulgarian,
Chinese, German, and Indian native
speakers reading 37 English sentences
each, produced in GlobalPhone-style,
i.e. 16kHz PCM encoded audio recordings
of utterance-segmented read speech from
the newspaper domain.
|ELRA-S0390 : Parallel EMG-Acoustic English GlobalPhone
The parallel EMG-Acoustic English
GlobalPhone language resource contains
63 recordings sessions from 8 speakers
articulating speech in three speaking
modes, audible, whispered, and silent by
reading three times 50 English sentences
in GlobalPhone-style, i.e. 16kHz PCM
encoded audio recordings of
utterance-segmented read speech from the
newspaper domain. Speech is recorded in
a parallel fashion, i.e. synchronously
by a standard close-talking microphone
and by surface electrodes capturing the
muscle activities of the articulatory
muscles in the face (EelectroMmyoGgraphy
|(last update: May 2017)