Search and Browse – ELRA Catalogue

2006 CoNLL Shared Task - Ten Languages text

Bulgarian
Danish
Dutch; Flemish
German
Japanese
Portuguese
Slovenian
Spanish; Castilian
Swedish
Turkish

ID: ELRA-W0086

2006 CoNLL Shared Task - Ten Languages consists of dependency treebanks in ten languages used as part of the CoNLL 2006 shared task on multi-lingual dependency parsing. The languages covered in this release are: Bulgarian, Danish, Dutch, German, Japanese, Portuguese, Slovene, Spanish, Swedish and...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	0.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	0.00 €

2007 CoNLL Shared Task - Basque, Catalan, Czech & Turkish text

Basque
Catalan; Valencian
Czech
Turkish

ID: ELRA-W0121

ISLRN: 769-620-932-723-2

2007 CoNLL Shared Task - Basque, Catalan, Czech & Turkish consists of dependency treebanks in four languages used as part of the CoNLL 2007 shared task on multi-lingual dependency parsing and domain adaptation. The languages covered in this release are: Basque, Catalan, Czech and Turkish. The ...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	0.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	0.00 €

2007 CoNLL Shared Task - Greek, Hungarian & Italian text

Hungarian
Italian
Modern Greek (1453-)

ID: ELRA-W0122

ISLRN: 270-733-242-642-3

2007 CoNLL Shared Task - Greek, Hungarian & Italian consists of dependency treebanks in three languages used as part of the CoNLL 2007 shared task on multi-lingual dependency parsing and domain adaptation. The languages covered in this release are: Greek, Hungarian and Italian. The Conference ...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	0.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	0.00 €

Annotated tweet corpus in Arabizi, French and English text

Arabic
English
French

ID: ELRA-W0323

ISLRN: 482-848-308-105-6

The annotated tweet corpus in Arabizi, French and English was built by ELDA on behalf of INSA Rouen Normandie (Normandie Université, LITIS team), in the framework of the SAPhIRS project (System for the Analysis of Information Propagation in Social Networks), funded by the DGE (Direction Générale ...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	7000.00 €
Licence: Commercial Use - ELRA VAR	7000.00 €	7000.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	10000.00 €
Licence: Commercial Use - ELRA VAR	10000.00 €	10000.00 €

ARCADE/ROMANSEVAL corpus text

English
French
Italian

ID: ELRA-W0018

ISLRN: 681-769-134-114-2

The ARCADE/ROMANSEVAL corpus was used as a reference corpus in two international competitions: · ARCADE, an exercise on multilingual text alignment financed by AUPELF-UREF · ROMANSEVAL, part of the SENSEVAL exercise sponsored by ACL-SIGLEX and EURALEX, on word sense disambiguation. The corpus ...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	2000.00 €
Licence: Commercial Use - ELRA VAR	2000.00 €	2000.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	5000.00 €
Licence: Commercial Use - ELRA VAR	5000.00 €	5000.00 €

CAREGIVER Corpus audio

Dutch; Flemish
English
Finnish

ID: ELRA-S0410

ISLRN: 072-357-063-759-1

A multi-lingual speech corpus used for modeling language acquisition called CAREGIVER has been designed and recorded within the framework of the EU funded Acquisition of Communication and Recognition Skills (ACORNS) project. The motivation behind the corpus and its design relies on current knowle...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	0.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	0.00 €

Collins Multilingual database (MLD) – PhraseBank with audio files audio

Arabic
Chinese
Croatian
Czech
Danish
Dutch; Flemish
English
Finnish
French
German
Hindi
Italian
Japanese
Korean
Modern Greek (1453-)
Norwegian
Persian
Polish
Portuguese
Russian
Spanish; Castilian
Swedish
Thai
Turkish
Vietnamese

ID: ELRA-S0383

ISLRN: 398-655-047-044-5

The Collins Multilingual database covers Real Life Daily vocabulary. It is composed of a multilingual lexicon in 32 languages (the WordBank, see ELRA-T0376) and a multilingual set of sentences in 28 languages (the PhraseBank, see ELRA-T0377). This version includes the audio files corresponding t...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	3360.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	4480.00 €

Collins Multilingual database (MLD) – WordBank with audio files audio

Arabic
Chinese
Croatian
Czech
Danish
Dutch; Flemish
English
Finnish
French
German
Italian
Japanese
Korean
Modern Greek (1453-)
Norwegian
Polish
Portuguese
Russian
Spanish; Castilian
Swedish
Thai
Turkish
Vietnamese

ID: ELRA-S0382

ISLRN: 309-438-781-042-2

The Collins Multilingual database covers Real Life Daily vocabulary. It is composed of a multilingual lexicon in 32 languages (the WordBank, see ELRA-T0376) and a multilingual set of sentences in 28 languages (the PhraseBank, see ELRA-T0377). This version includes the corresponding audio files c...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	3640.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	5200.00 €

Corpus of Interactions between Seniors and an Empathic Virtual Coach in Spanish, French and Norwegian text

English
French
Norwegian
Spanish; Castilian

ID: ELRA-S0414

ISLRN: 631-345-309-445-9

The Corpus of Interactions between Seniors and an Empathic Virtual Coach in Spanish, French and Norwegian was built within the EMPATHIC project (Empathic, Expressive, Advanced Virtual Coach to Improve Independent Healthy-Life-Years of the Elderly), funded within the European Union’s Horizon 2020 ...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	500.00 €	25000.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	500.00 €	25000.00 €

Special offers are also available. Check here for details.

CRATER 2 Corpus text

English
French
Spanish; Castilian

ID: ELRA-W0033

ISLRN: 052-466-219-226-4

The CRATER corpus was built upon the foundations of an earlier project, ET10/63, which was funded in the final phase of the Eurotra programme. The Corpus Resources and Terminology Extraction project (MLAP-93 20) extended the bilingual annotated English-French International Telecommunications Unio...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	25.00 €
Licence: Commercial Use - ELRA VAR	25.00 €	25.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	125.00 €
Licence: Commercial Use - ELRA VAR	125.00 €	125.00 €

CRATER corpus text

English
French
Spanish; Castilian

ID: ELRA-W0003

ISLRN: 645-721-607-031-5

The Corpus Resources and Terminology Extraction project (MLAP-93 20) has extended the bilingual annotated English-French International Telecommunications Union corpus to include Spanish, and has also debugged the existing corpus. The offer consists of a multi-lingual aligned corpus of 1,000,000 t...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	20.00 €
Licence: Commercial Use - ELRA VAR	20.00 €	20.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	100.00 €
Licence: Commercial Use - ELRA VAR	100.00 €	100.00 €

ECI/MCI (European Corpus Initiative/Multilingual Corpus I) text

Albanian
Bulgarian
Chinese
Czech
Danish
Dutch; Flemish
English
Estonian
French
German
Italian
Japanese
Latin
Lithuanian
Malay (macrolanguage)
Modern Greek (1453-)
Norwegian
Portuguese
Russian
Scottish Gaelic; Gaelic
Serbian
Spanish; Castilian
Swedish
Turkish
Uzbek

ID: ELRA-W0004

ISLRN: 511-168-567-582-5

The European Corpus Initiative (ECI) was founded to oversee the acquisition and preparation of a large multilingual corpus, and supports existing and projected national and international efforts to carefully design, collect and publish large-scale multilingual written and spoken corpora. ECI has ...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	50.00 €	50.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	50.00 €	50.00 €

English-Chinese-Vietnamese Trilingual Parallel Corpus text

Chinese
English
Vietnamese

ID: ELRA-W0314

ISLRN: 637-630-726-817-9

The English-Chinese-Vietnamese Trilingual Parallel Corpus consists of 20,046 trilingual sets of sentence pairs. The corpus is provided in XML format and is annotated according to TEI-encoding guidelines.

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	150.00 €	500.00 €
Licence: Commercial Use - ELRA VAR	1000.00 €	1000.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	225.00 €	750.00 €
Licence: Commercial Use - ELRA VAR	1500.00 €	1500.00 €

European Parliament Interpretation Corpus (EPIC) audio

English
Italian
Spanish; Castilian

ID: ELRA-S0323

ISLRN: 716-168-855-843-2

The EPIC corpus is a parallel corpus of European Parliament speeches and their corresponding simultaneous interpretations. This corpus includes source speeches in Italian, English and Spanish and interpreted speeches in all possible combinations and directions (from English into Italian and Spani...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €

GlobalPhone 2000 Speaker Package audio

Arabic
Bulgarian
Chinese
Croatian
Czech
French
German
Hausa
Japanese
Korean
Polish
Portuguese
Russian
Spanish; Castilian
Swahili (macrolanguage)
Swedish
Tamil
Thai
Turkish
Ukrainian
Vietnamese

ID: ELRA-S0400

ISLRN: 331-592-378-424-7

The GlobalPhone 2000 Speaker Package contains transcribed read speech spoken by 2000 native speakers in 22 languages. The data are sampled from the GlobalPhone Speech and Text Data available in the ELRA Catalogue, i.e.: Arabic (ELRA-S0192), Bulgarian (ELRA-S0319), Chinese-Mandarin (ELRA-S0193), C...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	1200.00 €	6000.00 €
Licence: Commercial Use - ELRA VAR	6000.00 €	6000.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	1400.00 €	7200.00 €
Licence: Commercial Use - ELRA VAR	7200.00 €	7200.00 €

GlobalPhone Multilingual Model Package audio

Arabic
Bulgarian
Chinese
Croatian
Czech
French
German
Hausa
Japanese
Korean
Polish
Portuguese
Russian
Spanish; Castilian
Swahili (macrolanguage)
Swedish
Tamil
Thai
Turkish
Ukrainian
Vietnamese

ID: ELRA-S0399

ISLRN: 204-945-263-927-6

The GlobalPhone Multilingual Model Package contains about 22 hours of transcribed read speech spoken by native speakers in 22 languages. The data are sampled from the GlobalPhone Speech and Text Data available in the ELRA Catalogue, i.e.: Arabic (ELRA-S0192), Bulgarian (ELRA-S0319), Chinese-Manda...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	1200.00 €	6000.00 €
Licence: Commercial Use - ELRA VAR	6000.00 €	6000.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	1400.00 €	7200.00 €
Licence: Commercial Use - ELRA VAR	7200.00 €	7200.00 €

MAURDOR Evaluation Package video

Arabic
English
French

ID: ELRA-E0045

ISLRN: 364-018-517-901-2

The MAURDOR project consists in evaluating systems for automatic processing of written documents. Collected written documents are scanned documents (printed, typewritten or manuscripts). In order to get images for the evaluation of automatic analysis systems, 10,000 original documents were c...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	500.00 €	10000.00 €
Licence: Evaluation Use - ELRA EVALUATION		5000.00 €
Licence: Commercial Use - ELRA VAR	10000.00 €	10000.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	750.00 €	15000.00 €
Licence: Evaluation Use - ELRA EVALUATION		7500.00 €
Licence: Commercial Use - ELRA VAR	15000.00 €	15000.00 €

MIST Multi-lingual Interoperability in Speech Technology database audio

Dutch; Flemish
English
French
German

ID: ELRA-S0238

ISLRN: 189-835-264-931-4

In 1996, some 75 Dutch people participated in recording a multi-purpose continuous speech database. Most of them were recruited from the TNO Human Factors Research Institute, where the recordings were made. The main part of the database consisted of Dutch sentences. However, most speakers partici...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	400.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	500.00 €

MLCC Multilingual and Parallel Corpora text

Danish
Dutch; Flemish
English
French
German
Italian
Modern Greek (1453-)
Portuguese
Spanish; Castilian

ID: ELRA-W0023

ISLRN: 963-635-729-341-8

The MLCC text corpus has two main components - one set to allow comparable studies to be carried out in different languages and one set as the basis for translation studies. The first set is referred as the Polylingual Document Collection, a collection of newspaper articles from financial new...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	1600.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	3600.00 €

MULTEXT JOC Corpus text

English
French
German
Italian
Spanish; Castilian

ID: ELRA-W0017

ISLRN: 900-482-746-635-0

This CD-ROM contains a part of the corpus developed in the MULTEXT project financed by the European Commission (LRE 62-050). This part contains raw, tagged and aligned data from the Written Questions and Answers of the Official Journal of the European Community. The corpus contains approx. 5 mill...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	2000.00 €
Licence: Commercial Use - ELRA VAR	2000.00 €	2000.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	5000.00 €
Licence: Commercial Use - ELRA VAR	5000.00 €	5000.00 €

Corpus:
Lexical/Conceptual:
Tool/Service:
Language Description:

Text:
Audio:
Image:
Video:
Text Numerical:
Text N-Gram:

Resource Type:

Media Type:

24 Language Resources (Page 1 of 2)