Resource Type:
Corpus: | |
Lexical/Conceptual: | |
Tool/Service: | |
Language Description: |
Media Type:
Text: | |
Audio: | |
Image: | |
Video: | |
Text Numerical: | |
Text N-Gram: |
13 Language Resources
Order by:
- English
- French
- Spanish; Castilian
ID: ELRA-W0003
ISLRN: 645-721-607-031-5The Corpus Resources and Terminology Extraction project (MLAP-93 20) has extended the bilingual annotated English-French International Telecommunications Union corpus to include Spanish, and has also debugged the existing corpus. The offer consists of a multi-lingual aligned corpus of 1,000,000 t...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
|
20.00 €
|
Licence: Commercial Use - ELRA VAR |
20.00 €
|
20.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
|
100.00 €
|
Licence: Commercial Use - ELRA VAR |
100.00 €
|
100.00 €
|
- English
- Panjabi; Punjabi
ID: ELRA-W0319
ISLRN: 695-759-706-170-8The English-Punjabi Code-Mixed Social Media Content corpus is composed is composed of 893,615 parallel sentences of English-Punjabi distributed over the following domains: - 82,341 parallel sentences of English-Punjabi code-mixed Agriculture Domain Data, - 59,158 parallel sentences of English-P...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
|
0.00 €
|
Licence: Commercial Use - ELRA VAR |
0.00 €
|
0.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
|
0.00 €
|
Licence: Commercial Use - ELRA VAR |
0.00 €
|
0.00 €
|
- English
- Vietnamese
ID: ELRA-W0124
ISLRN: 838-483-738-912-8This is a corpus of 500,000 English-Vietnamese sentence pairs, built to develop SMT (Statistical Machine Translation) systems. The parallel corpus contains English documents translated by professional translators into Vietnamese. The source texts include books, dictionaries, newspapers, online ne...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
600.00 €
|
1200.00 €
|
Licence: Commercial Use - ELRA VAR |
6000.00 €
|
6000.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
1000.00 €
|
2000.00 €
|
Licence: Commercial Use - ELRA VAR |
8000.00 €
|
8000.00 €
|
- Italian
ID: ELRA-W0044
ISLRN: 927-246-660-947-9ISST comprises 89,941 tokens for the financial-domain part and 215,606 tokens for the general part. It is formatted in XML. ISST has a five-level structure covering orthographic, morpho-syntactic, syntactic and semantic levels of linguistic description. Syntactic annotation is distributed over t...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
100.00 €
|
1500.00 €
|
Licence: Commercial Use - ELRA VAR |
1500.00 €
|
1500.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
150.00 €
|
2500.00 €
|
Licence: Commercial Use - ELRA VAR |
2500.00 €
|
2500.00 €
|
- Berber languages
ID: ELRA-W0322
ISLRN: 176-700-464-150-5"La Dépêche de Kabylie" Corpus consists of about 1,570,000 words in Amazigh language collected from the Algerian newspaper entitled “La Dépêche de Kabylie”. It was collected thanks to HTTrack Website Copier and contains about 90% of all entries of the Amazigh language. All articles are gathered u...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
|
100.00 €
|
Licence: Commercial Use - ELRA VAR |
100.00 €
|
100.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
|
150.00 €
|
Licence: Commercial Use - ELRA VAR |
150.00 €
|
150.00 €
|
- Arabic
ID: ELRA-W0078
ISLRN: 398-979-151-557-0The NE3L project (Named Entities 3 Languages) consisted in annotating several corpora with different languages with named entities. Text format data were extracted from newspapers and deal with various topics. 3 different languages were annotated: Arabic, Chinese and Russian. For this project, 5...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
5000.00 €
|
5000.00 €
|
Licence: Commercial Use - ELRA VAR |
5000.00 €
|
5000.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
5000.00 €
|
5000.00 €
|
Licence: Commercial Use - ELRA VAR |
5000.00 €
|
5000.00 €
|
- Chinese
ID: ELRA-W0079
ISLRN: 187-154-782-686-9The NE3L project (Named Entities 3 Languages) consisted in annotating several corpora with different languages with named entities. Text format data were extracted from newspapers and deal with various topics. 3 different languages were annotated: Arabic, Chinese and Russian. For this project, 5...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
5000.00 €
|
5000.00 €
|
Licence: Commercial Use - ELRA VAR |
5000.00 €
|
5000.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
5000.00 €
|
5000.00 €
|
Licence: Commercial Use - ELRA VAR |
5000.00 €
|
5000.00 €
|
- Russian
ID: ELRA-W0080
ISLRN: 024-620-556-146-2The NE3L project (Named Entities 3 Languages) consisted in annotating several corpora with different languages with named entities. Text format data were extracted from newspapers and deal with various topics. 3 different languages were annotated: Arabic, Chinese and Russian. For this project, 5...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
5000.00 €
|
5000.00 €
|
Licence: Commercial Use - ELRA VAR |
5000.00 €
|
5000.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
5000.00 €
|
5000.00 €
|
Licence: Commercial Use - ELRA VAR |
5000.00 €
|
5000.00 €
|
- Mongolian
ID: ELRA-W0120
ISLRN: 492-817-146-504-9This is a corpus of Mongolian text mostly from domains like online or printed daily newspapers, literature, and laws. The collected raw texts was reduced from 5 to 4.8 million words after cleaning. The cleaned corpus comprises: - 144 texts from laws until 2009, - 288 texts from literature t...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
|
5000.00 €
|
Licence: Commercial Use - ELRA VAR |
5000.00 €
|
5000.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
|
7000.00 €
|
Licence: Commercial Use - ELRA VAR |
7000.00 €
|
7000.00 €
|
- Portuguese
ID: ELRA-W0060
ISLRN: 294-303-577-819-2The PTPARL Corpus contains 1,076 texts consisting of adapted transcriptions of the Portuguese Parliament sessions. The corpus contains 1,000,441 tokens. The corpus is delivered in one file, in two different formats. The txt version has one sentence per line, an identification number for each ...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
|
0.00 €
|
Licence: Commercial Use - ELRA VAR |
0.00 €
|
0.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
|
0.00 €
|
Licence: Commercial Use - ELRA VAR |
0.00 €
|
0.00 €
|
- Khasi
ID: ELRA-W0321
ISLRN: 926-738-235-188-8The Tham Khasi annotated corpus is a Khasi corpus, an Austro-Asiatic language, comprising of Khasi sentences extracted from textbooks prescribed for students in secondary, higher secondary, graduation, and post-graduation in the year 2015-2016. In the corpus, each word is separated by a space and...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
|
0.00 €
|
Licence: Commercial Use - ELRA VAR |
0.00 €
|
0.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
|
0.00 €
|
Licence: Commercial Use - ELRA VAR |
0.00 €
|
0.00 €
|
- Portuguese
ID: ELRA-W0050
ISLRN: 176-775-844-396-0CINTIL-Corpus Internacional do Português is a linguistically interpreted written and spoken corpus of European Portuguese. It is composed of one million annotated tokens, each one of which verified by human expert annotators. The annotation comprises information on part-of-speech, open class lemm...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
|
10000.00 €
|
Licence: Commercial Use - ELRA VAR |
10000.00 €
|
10000.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
|
15000.00 €
|
Licence: Commercial Use - ELRA VAR |
15000.00 €
|
15000.00 €
|
- Italian
ID: ELRA-W0040
ISLRN: 008-645-281-539-8The VIT, Venice Italian Treebank is the effort of the collaboration of people working at the Laboratory of Computational Linguistics of the University of Venice in the years 1995-2005. It is partly the result of annotation carried out internally with no specific project in mind and no financial s...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
3000.00 €
|
7000.00 €
|
Licence: Commercial Use - ELRA VAR |
7000.00 €
|
7000.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
4000.00 €
|
10000.00 €
|
Licence: Commercial Use - ELRA VAR |
10000.00 €
|
10000.00 €
|