Resource Type:
Corpus: | |
Lexical/Conceptual: | |
Tool/Service: | |
Language Description: |
Media Type:
Text: | |
Audio: | |
Image: | |
Video: | |
Text Numerical: | |
Text N-Gram: |
114 Language Resources (Page 2 of 6)
« Previous | Next »Order by:
- Portuguese
ID: ELRA-W0055
ISLRN: 411-691-515-701-9The CINTIL-TreeBank is a corpus of syntactic constituency trees of Portuguese texts composed of 10,039 sentences and 110,166 tokens taken from different sources and domains: news (8,861 sentences; 101,430 tokens), novels (399 sentences; 3,082 tokens). In addition, there are 779 sentences (5,654 t...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
|
3000.00 €
|
Licence: Commercial Use - ELRA VAR |
3000.00 €
|
3000.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
|
3000.00 €
|
Licence: Commercial Use - ELRA VAR |
3000.00 €
|
3000.00 €
|
- English
ID: ELRA-W0337
ISLRN: 478-366-550-085-8The Corpus for fine-grained analysis and automatic detection of irony on Twitter was carefully annotated by trained annotators (Master’s students in Linguistics) using a detailed annotation scheme for irony categorization, which describes four labels: ‘ironic by means of a polarity contrast’, ‘si...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
|
100.00 €
|
Licence: Commercial Use - ELRA VAR |
100.00 €
|
100.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
|
200.00 €
|
Licence: Commercial Use - ELRA VAR |
200.00 €
|
200.00 €
|
- Spanish; Castilian
ID: ELRA-W0041
ISLRN: 837-873-214-287-0This corpus consists of 11 novels written in Castilian Spanish by Inmaculada Ferrer-Vidal Turull, a contemporaneous author. The list of novels consists of: - La búsqueda: 113,639 words - Tristeza: 41,125 words - Cuarto menguante: 42,419 words - Recuerdos: 55,694 words - Sucedió en Abril: 46,040 w...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
400.00 €
|
800.00 €
|
Licence: Commercial Use - ELRA VAR |
800.00 €
|
800.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
500.00 €
|
1000.00 €
|
Licence: Commercial Use - ELRA VAR |
1000.00 €
|
1000.00 €
|
- English
- French
- Spanish; Castilian
ID: ELRA-W0033
ISLRN: 052-466-219-226-4The CRATER corpus was built upon the foundations of an earlier project, ET10/63, which was funded in the final phase of the Eurotra programme. The Corpus Resources and Terminology Extraction project (MLAP-93 20) extended the bilingual annotated English-French International Telecommunications Unio...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
|
25.00 €
|
Licence: Commercial Use - ELRA VAR |
25.00 €
|
25.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
|
125.00 €
|
Licence: Commercial Use - ELRA VAR |
125.00 €
|
125.00 €
|
- English
- French
- Spanish; Castilian
ID: ELRA-W0003
ISLRN: 645-721-607-031-5The Corpus Resources and Terminology Extraction project (MLAP-93 20) has extended the bilingual annotated English-French International Telecommunications Union corpus to include Spanish, and has also debugged the existing corpus. The offer consists of a multi-lingual aligned corpus of 1,000,000 t...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
|
20.00 €
|
Licence: Commercial Use - ELRA VAR |
20.00 €
|
20.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
|
100.00 €
|
Licence: Commercial Use - ELRA VAR |
100.00 €
|
100.00 €
|
- Danish
ID: ELRA-W0117
ISLRN: 213-212-351-142-5The Danish Propbank (DPB) is a multi-layer treebank, annotated not only with morphosyntactic, but also with semantic information, in particular propositions/frames with VerbNet classes and semantic roles for both arguments and satellites. In addition, the corpus has been annotated with 20 Named E...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
150.00 €
|
3000.00 €
|
Licence: Commercial Use - ELRA VAR |
5000.00 €
|
5000.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
800.00 €
|
7000.00 €
|
Licence: Commercial Use - ELRA VAR |
7000.00 €
|
7000.00 €
|
- German
ID: ELRA-W0083
ISLRN: 339-799-085-669-8The deL1L2IM corpus, created between May and August 2012 and last updated in August 2014, has been collected within the framework of a PhD project on the development of a learning method implying conversations with an artificial companion. This PhD work is presented as a qualitative investigation...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
|
0.00 €
|
Licence: Commercial Use - ELRA VAR |
0.00 €
|
0.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
|
0.00 €
|
Licence: Commercial Use - ELRA VAR |
0.00 €
|
0.00 €
|
- Dutch; Flemish
ID: ELRA-W0019
ISLRN: 440-290-917-102-7The Dutch PAROLE Distributable Corpus is a 3 million words selection from the 20 million words Dutch PAROLE Reference corpus. The Dutch corpus annotation and checking was made accordingly to the common core PAROLE tagset. The Dutch data were also checked for type. The Dutch PAROLE Distributable...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
270.00 €
|
800.00 €
|
Licence: Commercial Use - ELRA VAR |
1600.00 €
|
1600.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
300.00 €
|
1300.00 €
|
Licence: Commercial Use - ELRA VAR |
2500.00 €
|
2500.00 €
|
Special offers are also available. Check here for details.
- Chinese
- English
- Vietnamese
ID: ELRA-W0314
ISLRN: 637-630-726-817-9The English-Chinese-Vietnamese Trilingual Parallel Corpus consists of 20,046 trilingual sets of sentence pairs. The corpus is provided in XML format and is annotated according to TEI-encoding guidelines.
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
150.00 €
|
500.00 €
|
Licence: Commercial Use - ELRA VAR |
1000.00 €
|
1000.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
225.00 €
|
750.00 €
|
Licence: Commercial Use - ELRA VAR |
1500.00 €
|
1500.00 €
|
- English
- Persian
ID: ELRA-W0118
ISLRN: 074-825-114-781-7The English-Persian parallel corpus contains more than 200,000 aligned sentences across a variety of text types from the domains of art, law, culture, science, religion, literature, medicine, idioms, politics and others. It is an extension of the English-Persian parallel corpus already distribute...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
1000.00 €
|
5000.00 €
|
Licence: Commercial Use - ELRA VAR |
5000.00 €
|
5000.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
1200.00 €
|
6000.00 €
|
Licence: Commercial Use - ELRA VAR |
6000.00 €
|
6000.00 €
|
- English
- Persian
ID: ELRA-W0051
ISLRN: 671-618-321-687-7Please refer to ELRA-W0118 for the latest version of this corpus. This version consists of about 3,500,000 English and Persian (Farsi) words aligned at sentence level (about 100,000 sentences, distributed over 50,021 entries). The format of the files is Unicode. It has been originally created wi...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
500.00 €
|
2500.00 €
|
Licence: Commercial Use - ELRA VAR |
2500.00 €
|
2500.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
600.00 €
|
3000.00 €
|
Licence: Commercial Use - ELRA VAR |
3000.00 €
|
3000.00 €
|
- English
- Panjabi; Punjabi
ID: ELRA-W0319
ISLRN: 695-759-706-170-8The English-Punjabi Code-Mixed Social Media Content corpus is composed is composed of 893,615 parallel sentences of English-Punjabi distributed over the following domains: - 82,341 parallel sentences of English-Punjabi code-mixed Agriculture Domain Data, - 59,158 parallel sentences of English-P...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
|
0.00 €
|
Licence: Commercial Use - ELRA VAR |
0.00 €
|
0.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
|
0.00 €
|
Licence: Commercial Use - ELRA VAR |
0.00 €
|
0.00 €
|
- English
- Vietnamese
ID: ELRA-W0311
ISLRN: 893-470-491-825-6The English-Vietnamese Parallel Corpus consists of 1,000,000 sentence pairs, with an average length of 20 words per sentence. The corpus is provided in XML format and is annotated according to TEI-encoding guidelines.
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
900.00 €
|
1800.00 €
|
Licence: Commercial Use - ELRA VAR |
9000.00 €
|
9000.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
1500.00 €
|
3000.00 €
|
Licence: Commercial Use - ELRA VAR |
12000.00 €
|
12000.00 €
|
- English
- Vietnamese
ID: ELRA-W0124
ISLRN: 838-483-738-912-8This is a corpus of 500,000 English-Vietnamese sentence pairs, built to develop SMT (Statistical Machine Translation) systems. The parallel corpus contains English documents translated by professional translators into Vietnamese. The source texts include books, dictionaries, newspapers, online ne...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
600.00 €
|
1200.00 €
|
Licence: Commercial Use - ELRA VAR |
6000.00 €
|
6000.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
1000.00 €
|
2000.00 €
|
Licence: Commercial Use - ELRA VAR |
8000.00 €
|
8000.00 €
|
- English
- Portuguese
ID: ELRA-W0090
ISLRN: 435-502-922-727-2The EUROPARL Corpus (Portuguese-English subpart of the parallel corpora), was extracted from the proceedings of the European Parliament. It contains transcriptions of sessions dating back from 1996 to 2011, with a total of approximately 58,324,562 tokens of European Portuguese (L1) and 49,216,896...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
|
0.00 €
|
Licence: Commercial Use - ELRA VAR |
0.00 €
|
0.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
|
0.00 €
|
Licence: Commercial Use - ELRA VAR |
0.00 €
|
0.00 €
|
- Swahili (macrolanguage)
ID: ELRA-W0119
ISLRN: 941-187-059-145-7This is a text corpus of Swahili language of 25 million words, annotated for part-of-speech, morphology and syntax. The corpus contains prose text from fiction, news media and government documents domains, from the period between 1953 and 2016. This package contains: - the Helsinki Corpus of Swa...
MEMBER | academic | commercial |
---|---|---|
Licence: Commercial Use - ELRA VAR |
7500.00 €
|
7500.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Commercial Use - ELRA VAR |
15000.00 €
|
15000.00 €
|
- Italian
ID: ELRA-W0044
ISLRN: 927-246-660-947-9ISST comprises 89,941 tokens for the financial-domain part and 215,606 tokens for the general part. It is formatted in XML. ISST has a five-level structure covering orthographic, morpho-syntactic, syntactic and semantic levels of linguistic description. Syntactic annotation is distributed over t...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
100.00 €
|
1500.00 €
|
Licence: Commercial Use - ELRA VAR |
1500.00 €
|
1500.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
150.00 €
|
2500.00 €
|
Licence: Commercial Use - ELRA VAR |
2500.00 €
|
2500.00 €
|
- German
ID: ELRA-W0016
ISLRN: 628-817-117-400-1The "Karl-May-Korpus" is a monolingual German corpus, available in an SGML-tagged ASCII text format. It contains the works of the German author Karl May (1842-1912) and consists of around 1.6 million words (divided into 9 subcorpora of about 180,000 words each). The corpus was created between 199...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
400.00 €
|
2500.00 €
|
Licence: Commercial Use - ELRA VAR |
2500.00 €
|
2500.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
800.00 €
|
3500.00 €
|
Licence: Commercial Use - ELRA VAR |
3500.00 €
|
3500.00 €
|
- English
ID: ELRA-W0081
ISLRN: 764-036-829-417-7The Manually Annotated Reference Corpus is a collection of English web documents annotated with key entities (such as disease, drug), built in the framework of the Khresmoi project, funded by the European Commission. It has been constructed by first annotating these entities with an imperfect aut...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
|
0.00 €
|
Licence: Commercial Use - ELRA VAR |
0.00 €
|
0.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
|
0.00 €
|
Licence: Commercial Use - ELRA VAR |
0.00 €
|
0.00 €
|
- Korean
- Vietnamese
ID: ELRA-W0313
ISLRN: 365-128-449-700-7The Korean-Vietnamese Parallel Corpus consists of 200,000 sentence pairs, with an average length of 15 words per sentence. The corpus is provided in XML format and is annotated according to TEI-encoding guidelines.
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
200.00 €
|
400.00 €
|
Licence: Commercial Use - ELRA VAR |
1400.00 €
|
1400.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
300.00 €
|
600.00 €
|
Licence: Commercial Use - ELRA VAR |
2100.00 €
|
2100.00 €
|