16 Language Resources

Order by:

 2006 CoNLL Shared Task - Ten Languages    
  • Bulgarian
  • Danish
  • Dutch; Flemish
  • German
  • Japanese
  • Portuguese
  • Slovenian
  • Spanish; Castilian
  • Swedish
  • Turkish

ID: ELRA-W0086

ISLRN: 578-227-532-044-0

2006 CoNLL Shared Task - Ten Languages consists of dependency treebanks in ten languages used as part of the CoNLL 2006 shared task on multi-lingual dependency parsing. The languages covered in this release are: Bulgarian, Danish, Dutch, German, Japanese, Portuguese, Slovene, Spanish, Swedish and...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
0.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
0.00 € submit
 ARCADE II Evaluation Package    
  • Arabic
  • Chinese
  • English
  • French
  • German
  • Italian
  • Japanese
  • Modern Greek (1453-)
  • Persian
  • Russian
  • Spanish; Castilian

ID: ELRA-E0018

ISLRN: 875-865-064-331-9

The ARCADE II Evaluation Package was produced within the French national project ARCADE II (Evaluation of parallel text alignment systems), as part of the Technolangue programme funded by the French Ministry of Research and New Technologies (MRNT). The ARCADE II project enabled to carry out a cam...

MEMBERacademiccommercial
Licence: Evaluation Use - ELRA EVALUATION
150.00 € submit
500.00 € submit
NON MEMBERacademiccommercial
Licence: Evaluation Use - ELRA EVALUATION
300.00 € submit
1000.00 € submit
 CLEF AdHoc-News Test Suites (2004-2008) – Evaluation Package    
  • Bulgarian
  • Czech
  • Dutch; Flemish
  • English
  • Finnish
  • French
  • German
  • Hungarian
  • Italian
  • Persian
  • Portuguese
  • Russian
  • Spanish; Castilian
  • Swedish

ID: ELRA-E0036

ISLRN: 378-279-085-589-0

The Cross-Language Evaluation Forum (CLEF) promotes R&D in multilingual information access (MLIA) by (i) developing an infrastructure for the testing, tuning and evaluation of information retrieval systems operating on European languages in both monolingual and cross-language contexts, and (ii) c...

MEMBERacademiccommercial
Licence: Evaluation Use - ELRA EVALUATION
150.00 € submit
500.00 € submit
NON MEMBERacademiccommercial
Licence: Evaluation Use - ELRA EVALUATION
300.00 € submit
1000.00 € submit

Special offers are also available. Check here for details.

 CLEF Domain Specific Test Suites (2004-2008) – Evaluation Package    
  • English
  • German
  • Russian

ID: ELRA-E0037

ISLRN: 609-362-685-537-2

The Cross-Language Evaluation Forum (CLEF) promotes R&D in multilingual information access (MLIA) by (i) developing an infrastructure for the testing, tuning and evaluation of information retrieval systems operating on European languages in both monolingual and cross-language contexts, and (ii) c...

MEMBERacademiccommercial
Licence: Evaluation Use - ELRA EVALUATION
150.00 € submit
500.00 € submit
NON MEMBERacademiccommercial
Licence: Evaluation Use - ELRA EVALUATION
300.00 € submit
1000.00 € submit

Special offers are also available. Check here for details.

 CLEF Question Answering Test Suites (2003-2008) – Evaluation Package    
  • Bulgarian
  • Dutch; Flemish
  • English
  • Finnish
  • French
  • German
  • Italian
  • Portuguese
  • Romanian; Moldavian; Moldovan
  • Spanish; Castilian

ID: ELRA-E0038

ISLRN: 394-993-527-034-7

The Cross-Language Evaluation Forum (CLEF) promotes R&D in multilingual information access (MLIA) by (i) developing an infrastructure for the testing, tuning and evaluation of information retrieval systems operating on European languages in both monolingual and cross-language contexts, and (ii) c...

MEMBERacademiccommercial
Licence: Evaluation Use - ELRA EVALUATION
150.00 € submit
500.00 € submit
NON MEMBERacademiccommercial
Licence: Evaluation Use - ELRA EVALUATION
300.00 € submit
1000.00 € submit

Special offers are also available. Check here for details.

 deL1L2IM corpus    
  • German

ID: ELRA-W0083

ISLRN: 339-799-085-669-8

The deL1L2IM corpus, created between May and August 2012 and last updated in August 2014, has been collected within the framework of a PhD project on the development of a learning method implying conversations with an artificial companion. This PhD work is presented as a qualitative investigation...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
0.00 € submit
Licence: Commercial Use - ELRA VAR
0.00 € submit
0.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
0.00 € submit
Licence: Commercial Use - ELRA VAR
0.00 € submit
0.00 € submit
 ECI-ELSNET Italian & German tagged sub-corpus    
  • German
  • Italian

ID: ELRA-W0005

ISLRN: 869-857-775-378-7

The objective is to provide a small but fine grained morphosyntactically tagged corpus, 50.000 running words for each of the two languages (Italian and German) to be used in research work on tagging methods and models. The text for German comes from the Frankfurter Rundschau extracted from the EC...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
20.00 € submit
20.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
45.00 € submit
45.00 € submit
 ECI/MCI (European Corpus Initiative/Multilingual Corpus I)    
  • Albanian
  • Bulgarian
  • Chinese
  • Czech
  • Danish
  • Dutch; Flemish
  • English
  • Estonian
  • French
  • German
  • Italian
  • Japanese
  • Latin
  • Lithuanian
  • Malay (macrolanguage)
  • Modern Greek (1453-)
  • Norwegian
  • Portuguese
  • Russian
  • Scottish Gaelic; Gaelic
  • Serbian
  • Spanish; Castilian
  • Swedish
  • Turkish
  • Uzbek

ID: ELRA-W0004

ISLRN: 511-168-567-582-5

The European Corpus Initiative (ECI) was founded to oversee the acquisition and preparation of a large multilingual corpus, and supports existing and projected national and international efforts to carefully design, collect and publish large-scale multilingual written and spoken corpora. ECI has ...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
50.00 € submit
50.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
50.00 € submit
50.00 € submit
 GeFRePaC - German French Reciprocal Parallel Corpus    
  • French
  • German

ID: ELRA-W0031

ISLRN: 086-761-267-762-3

The German-French Reciprocal Parallel Corpus (GeFRePaC) was produced by the Multilinguale Forschung/Multilingual Research Abteilung Lexik, Institut für Deutsche Sprache (Germany) through a funding from ELRA in the framework of the European Commission project LRsP&P (Language Resources Production ...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
1500.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
2500.00 € submit
 Karl May Korpus (KMK)    
  • German

ID: ELRA-W0016

ISLRN: 628-817-117-400-1

The "Karl-May-Korpus" is a monolingual German corpus, available in an SGML-tagged ASCII text format. It contains the works of the German author Karl May (1842-1912) and consists of around 1.6 million words (divided into 9 subcorpora of about 180,000 words each). The corpus was created between 199...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
400.00 € submit
2500.00 € submit
Licence: Commercial Use - ELRA VAR
2500.00 € submit
2500.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
800.00 € submit
3500.00 € submit
Licence: Commercial Use - ELRA VAR
3500.00 € submit
3500.00 € submit
 MLCC Multilingual and Parallel Corpora    
  • Danish
  • Dutch; Flemish
  • English
  • French
  • German
  • Italian
  • Modern Greek (1453-)
  • Portuguese
  • Spanish; Castilian

ID: ELRA-W0023

ISLRN: 963-635-729-341-8

The MLCC text corpus has two main components - one set to allow comparable studies to be carried out in different languages and one set as the basis for translation studies. The first set is referred as the Polylingual Document Collection, a collection of newspaper articles from financial newsp...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
1600.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
3600.00 € submit
 MTP Annotated German corpus - tagged version    
  • German

ID: ELRA-W0008-02

ISLRN: 173-651-658-528-0

This morphosyntactically annotated 500,000 word German corpus was developed as part of the Münster Tagging Project (MTP). It comprises a collection of SGML-formatted texts from two German newspapers, "Die Frankfurter Allgemeine Zeitung" and "Die Zeit", for the years 1990 to 1992. The articles ref...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
8000.00 € submit
8000.00 € submit
Licence: Commercial Use - ELRA VAR
8000.00 € submit
8000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
12000.00 € submit
12000.00 € submit
Licence: Commercial Use - ELRA VAR
12000.00 € submit
12000.00 € submit
 MTP Annotated German corpus - untagged version    
  • German

ID: ELRA-W0008-01

ISLRN: 417-827-623-669-9

This morphosyntactically annotated 500,000 word German corpus was developed as part of the Münster Tagging Project (MTP). It comprises a collection of SGML-formatted texts from two German newspapers, "Die Frankfurter Allgemeine Zeitung" and "Die Zeit", for the years 1990 to 1992. The articles ref...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
2000.00 € submit
2000.00 € submit
Licence: Commercial Use - ELRA VAR
2000.00 € submit
2000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
3500.00 € submit
3500.00 € submit
Licence: Commercial Use - ELRA VAR
3500.00 € submit
3500.00 € submit
 MULTEXT JOC Corpus    
  • English
  • French
  • German
  • Italian
  • Spanish; Castilian

ID: ELRA-W0017

ISLRN: 900-482-746-635-0

This CD-ROM contains a part of the corpus developed in the MULTEXT project financed by the European Commission (LRE 62-050). This part contains raw, tagged and aligned data from the Written Questions and Answers of the Official Journal of the European Community. The corpus contains approx. 5 mill...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
2000.00 € submit
Licence: Commercial Use - ELRA VAR
2000.00 € submit
2000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
5000.00 € submit
Licence: Commercial Use - ELRA VAR
5000.00 € submit
5000.00 € submit
 The CLEF Test Suite for the CLEF 2000-2003 Campaigns – Evaluation Package    
  • Dutch; Flemish
  • English
  • Finnish
  • French
  • German
  • Italian
  • Portuguese
  • Russian
  • Spanish; Castilian
  • Swedish

ID: ELRA-E0008

ISLRN: 317-005-302-361-6

The CLEF Test Suite contains the data used for the main tracks of the CLEF campaigns carried out from 2000 to 2003: Multilingual text retrieval, Bilingual text retrieval, Monolingual text retrieval, and Domain-specific text retrieval. The CLEF Test Suite is composed of: • The multilingual docum...

MEMBERacademiccommercial
Licence: Evaluation Use - ELRA EVALUATION
150.00 € submit
500.00 € submit
NON MEMBERacademiccommercial
Licence: Evaluation Use - ELRA EVALUATION
300.00 € submit
1000.00 € submit

Special offers are also available. Check here for details.

 TSNLP (Test Suites for NLP Testing)    
  • English
  • French
  • German

ID: ELRA-W0013

ISLRN: 717-350-913-018-8

The TSNLP project (LRE 62-089) has produced a database of test suites for English, French and German containing over 4,000 test items (sentences or fragment of sentences) per language which have been constructed for evaluating natural language processing systems, but which may also be useful for ...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
100.00 € submit
Licence: Commercial Use - ELRA VAR
100.00 € submit
100.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
100.00 € submit
Licence: Commercial Use - ELRA VAR
100.00 € submit
100.00 € submit