Search and Browse – ELRA Catalogue

2006 CoNLL Shared Task - Ten Languages text

Bulgarian
Danish
Dutch; Flemish
German
Japanese
Portuguese
Slovenian
Spanish; Castilian
Swedish
Turkish

ID: ELRA-W0086

2006 CoNLL Shared Task - Ten Languages consists of dependency treebanks in ten languages used as part of the CoNLL 2006 shared task on multi-lingual dependency parsing. The languages covered in this release are: Bulgarian, Danish, Dutch, German, Japanese, Portuguese, Slovene, Spanish, Swedish and...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	0.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	0.00 €

deL1L2IM corpus text

German

ID: ELRA-W0083

ISLRN: 339-799-085-669-8

The deL1L2IM corpus, created between May and August 2012 and last updated in August 2014, has been collected within the framework of a PhD project on the development of a learning method implying conversations with an artificial companion. This PhD work is presented as a qualitative investigation...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	0.00 €
Licence: Commercial Use - ELRA VAR	0.00 €	0.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	0.00 €
Licence: Commercial Use - ELRA VAR	0.00 €	0.00 €

ECI-ELSNET Italian & German tagged sub-corpus text

German
Italian

ID: ELRA-W0005

ISLRN: 869-857-775-378-7

The objective is to provide a small but fine grained morphosyntactically tagged corpus, 50.000 running words for each of the two languages (Italian and German) to be used in research work on tagging methods and models. The text for German comes from the Frankfurter Rundschau extracted from the EC...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	20.00 €	20.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	45.00 €	45.00 €

ECI/MCI (European Corpus Initiative/Multilingual Corpus I) text

Albanian
Bulgarian
Chinese
Czech
Danish
Dutch; Flemish
English
Estonian
French
German
Italian
Japanese
Latin
Lithuanian
Malay (macrolanguage)
Modern Greek (1453-)
Norwegian
Portuguese
Russian
Scottish Gaelic; Gaelic
Serbian
Spanish; Castilian
Swedish
Turkish
Uzbek

ID: ELRA-W0004

ISLRN: 511-168-567-582-5

The European Corpus Initiative (ECI) was founded to oversee the acquisition and preparation of a large multilingual corpus, and supports existing and projected national and international efforts to carefully design, collect and publish large-scale multilingual written and spoken corpora. ECI has ...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	50.00 €	50.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	50.00 €	50.00 €

GeFRePaC - German French Reciprocal Parallel Corpus text

French
German

ID: ELRA-W0031

ISLRN: 086-761-267-762-3

The German-French Reciprocal Parallel Corpus (GeFRePaC) was produced by the Multilinguale Forschung/Multilingual Research Abteilung Lexik, Institut für Deutsche Sprache (Germany) through a funding from ELRA in the framework of the European Commission project LRsP&P (Language Resources Production ...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	0.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	0.00 €

Karl May Korpus (KMK) text

German

ID: ELRA-W0016

ISLRN: 628-817-117-400-1

The "Karl-May-Korpus" is a monolingual German corpus, available in an SGML-tagged ASCII text format. It contains the works of the German author Karl May (1842-1912) and consists of around 1.6 million words (divided into 9 subcorpora of about 180,000 words each). The corpus was created between 199...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	400.00 €	2500.00 €
Licence: Commercial Use - ELRA VAR	2500.00 €	2500.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	800.00 €	3500.00 €
Licence: Commercial Use - ELRA VAR	3500.00 €	3500.00 €

MLCC Multilingual and Parallel Corpora text

Danish
Dutch; Flemish
English
French
German
Italian
Modern Greek (1453-)
Portuguese
Spanish; Castilian

ID: ELRA-W0023

ISLRN: 963-635-729-341-8

The MLCC text corpus has two main components - one set to allow comparable studies to be carried out in different languages and one set as the basis for translation studies. The first set is referred as the Polylingual Document Collection, a collection of newspaper articles from financial new...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	1600.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	3600.00 €

MTP Annotated German corpus - tagged version text

German

ID: ELRA-W0008-02

ISLRN: 173-651-658-528-0

This morphosyntactically annotated 500,000 word German corpus was developed as part of the Münster Tagging Project (MTP). It comprises a collection of SGML-formatted texts from two German newspapers, "Die Frankfurter Allgemeine Zeitung" and "Die Zeit", for the years 1990 to 1992. The articles ref...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	8000.00 €	8000.00 €
Licence: Commercial Use - ELRA VAR	8000.00 €	8000.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	12000.00 €	12000.00 €
Licence: Commercial Use - ELRA VAR	12000.00 €	12000.00 €

MTP Annotated German corpus - untagged version text

German

ID: ELRA-W0008-01

ISLRN: 417-827-623-669-9

This morphosyntactically annotated 500,000 word German corpus was developed as part of the Münster Tagging Project (MTP). It comprises a collection of SGML-formatted texts from two German newspapers, "Die Frankfurter Allgemeine Zeitung" and "Die Zeit", for the years 1990 to 1992. The articles ref...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	2000.00 €	2000.00 €
Licence: Commercial Use - ELRA VAR	2000.00 €	2000.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	3500.00 €	3500.00 €
Licence: Commercial Use - ELRA VAR	3500.00 €	3500.00 €

MULTEXT JOC Corpus text

English
French
German
Italian
Spanish; Castilian

ID: ELRA-W0017

ISLRN: 900-482-746-635-0

This CD-ROM contains a part of the corpus developed in the MULTEXT project financed by the European Commission (LRE 62-050). This part contains raw, tagged and aligned data from the Written Questions and Answers of the Official Journal of the European Community. The corpus contains approx. 5 mill...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	2000.00 €
Licence: Commercial Use - ELRA VAR	2000.00 €	2000.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	5000.00 €
Licence: Commercial Use - ELRA VAR	5000.00 €	5000.00 €

TSNLP (Test Suites for NLP Testing) text

English
French
German

ID: ELRA-W0013

ISLRN: 717-350-913-018-8

The TSNLP project (LRE 62-089) has produced a database of test suites for English, French and German containing over 4,000 test items (sentences or fragment of sentences) per language which have been constructed for evaluating natural language processing systems, but which may also be useful for ...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	100.00 €
Licence: Commercial Use - ELRA VAR	100.00 €	100.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	100.00 €
Licence: Commercial Use - ELRA VAR	100.00 €	100.00 €

Corpus:
Lexical/Conceptual:
Tool/Service:
Language Description:

Text:
Audio:
Image:
Video:
Text Numerical:
Text N-Gram:

Resource Type:

Media Type:

11 Language Resources