Resource Type:
Corpus: | |
Lexical/Conceptual: | |
Tool/Service: | |
Language Description: |
Media Type:
Text: | |
Audio: | |
Image: | |
Video: | |
Text Numerical: | |
Text N-Gram: |
1010 Language Resources (Page 1 of 51)
« Previous | Next »Order by:
- Arabic
- Czech
ID: ELRA-W0087
ISLRN: 798-485-294-792-12006 CoNLL Shared Task – Arabic & Czech consists of dependency treebanks used as part of the CoNLL 2006 shared task on multi-lingual dependency parsing. The Conference on Computational Natural Language Learning (CoNLL) is accompanied every year by a shared task intended to promote natural lan...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - Non Standard Licence Terms |
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - Non Standard Licence Terms |
- Bulgarian
- Danish
- Dutch; Flemish
- German
- Japanese
- Portuguese
- Slovenian
- Spanish; Castilian
- Swedish
- Turkish
ID: ELRA-W0086
ISLRN: 578-227-532-044-02006 CoNLL Shared Task - Ten Languages consists of dependency treebanks in ten languages used as part of the CoNLL 2006 shared task on multi-lingual dependency parsing. The languages covered in this release are: Bulgarian, Danish, Dutch, German, Japanese, Portuguese, Slovene, Spanish, Swedish and...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
|
0.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
|
0.00 €
|
- Arabic
- English
ID: ELRA-W0123
ISLRN: 505-782-255-628-82007 CoNLL Shared Task - Arabic & English consists of dependency treebanks in two languages used as part of the CoNLL 2007 shared task on multi-lingual dependency parsing and domain adaptation. The languages covered in this release are: Arabic and English. The Conference on Computational Natur...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - Non Standard Licence Terms |
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - Non Standard Licence Terms |
- Basque
- Catalan; Valencian
- Czech
- Turkish
ID: ELRA-W0121
ISLRN: 769-620-932-723-22007 CoNLL Shared Task - Basque, Catalan, Czech & Turkish consists of dependency treebanks in four languages used as part of the CoNLL 2007 shared task on multi-lingual dependency parsing and domain adaptation. The languages covered in this release are: Basque, Catalan, Czech and Turkish. The ...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
|
0.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
|
0.00 €
|
- Hungarian
- Italian
- Modern Greek (1453-)
ID: ELRA-W0122
ISLRN: 270-733-242-642-32007 CoNLL Shared Task - Greek, Hungarian & Italian consists of dependency treebanks in three languages used as part of the CoNLL 2007 shared task on multi-lingual dependency parsing and domain adaptation. The languages covered in this release are: Greek, Hungarian and Italian. The Conference ...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
|
0.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
|
0.00 €
|
- French
ID: ELRA-W0082
ISLRN: 024-713-187-947-8A pluridisciplinary team of linguists and computer scientists (Rachel Panckhurst, Catherine Détrie, Cédric Lopez, Claudine Moïse, Mathieu Roche, Bertrand Verine (Praxiling, Lirmm, Lidilem, Tetis, Viseo) collected more than 88,000 French authentic text messages in Montpellier (2011), as part of th...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - Non Standard Licence Terms |
0.00 €
|
0.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - Non Standard Licence Terms |
0.00 €
|
0.00 €
|
- English
- Ukrainian
ID: ELRA-M0104
ISLRN: 110-617-195-245-4The bilingual English-Ukrainian lexicon of named entities uses Wikipedia metadata as a source. The extracted named entity pairs are classified into five classes: PERSON, ORGANIZATION, LOCATION, PRODUCT, and MISC (miscellaneous). The lexicon consists of 624,168 pairs and comes in two formats: csv ...
MEMBER | academic | commercial |
---|---|---|
Licence: Attribution, Non Commercial Use - CC-BY-NC-4.0 |
0.00 €
|
0.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Attribution, Non Commercial Use - CC-BY-NC-4.0 |
0.00 €
|
0.00 €
|
- English
ID: ELRA-S0389
ISLRN: 574-579-221-841-3The Accented English part of the GlobalPhone resources contains 63 recording sessions of Bulgarian, Chinese, German, and Indian native speakers reading 37 English sentences each, produced in GlobalPhone-style, i.e. 16kHz PCM encoded audio recordings of utterance-segmented read speech from the new...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
600.00 €
|
3000.00 €
|
Licence: Commercial Use - ELRA VAR |
3000.00 €
|
3000.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
700.00 €
|
3600.00 €
|
Licence: Commercial Use - ELRA VAR |
3600.00 €
|
3600.00 €
|
- English
ID: ELRA-S0001
ISLRN: 936-783-643-804-4ACCOR is a unique acoustic and articulatory database recorded as part of the ESPRIT- ACCOR project investigating cross-language acoustic-articulatory correlations in coarticulatory processes. The European Languages covered are: Catalan, English, French, German, Irish Gaelic, Italian and Swedish. ...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
|
25.00 €
|
Licence: Commercial Use - ELRA VAR |
25.00 €
|
25.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
|
75.00 €
|
Licence: Commercial Use - ELRA VAR |
75.00 €
|
75.00 €
|
- English
ID: ELRA-T0375
ISLRN: 699-305-362-089-6Automatic Term Recognition (ATR) is a research task that deals with the identification of domain-specific terms. Terms, in simple words, are textual realization of significant concepts in an expertise domain. Additionally, domain-specific terms may be classified into a number of categories, in wh...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
|
1000.00 €
|
Licence: Commercial Use - ELRA VAR |
1000.00 €
|
1000.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
0.00 €
|
3000.00 €
|
Licence: Commercial Use - ELRA VAR |
3000.00 €
|
3000.00 €
|
- Polish
ID: ELRA-S0342
ISLRN: 305-222-372-690-4This database consists of 1443 nonsense words including all the diphones for the Polish language. The diphone is always placed at an unstressed syllable. The neighbourhood doesn’t influence the co-articulation of the diphone. The database includes information such as: the name of the diphone, co...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
35.00 €
|
35.00 €
|
Licence: Commercial Use - ELRA VAR |
35.00 €
|
35.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
100.00 €
|
100.00 €
|
Licence: Commercial Use - ELRA VAR |
100.00 €
|
100.00 €
|
- Polish
ID: ELRA-S0339
ISLRN: 981-910-282-065-4This database contains parliamentary statements and newspaper reviews read by a semi-professional male speaker. It consists of a selection of 2150 sentences annotated and manually verified, including 100 rare phonemes in words. Prompts vary in length from 2.3 to 13.4 seconds, with an average leng...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
250.00 €
|
1000.00 €
|
Licence: Commercial Use - ELRA VAR |
1000.00 €
|
1000.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
300.00 €
|
2000.00 €
|
Licence: Commercial Use - ELRA VAR |
2000.00 €
|
2000.00 €
|
- German
ID: ELRA-S0365
ISLRN: 038-476-412-610-4aGender contains speech sample recordings over public telephone lines with read and (semi-)spontaneous speech. Native German speakers called a voice portal from their private phone, and read text + answered some open questions. The purpose of the corpus is the automatic detection of gender and/or...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
327.00 €
|
8127.00 €
|
Licence: Commercial Use - ELRA VAR |
8127.00 €
|
8127.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
455.00 €
|
8255.00 €
|
Licence: Commercial Use - ELRA VAR |
8255.00 €
|
8255.00 €
|
- Spanish; Castilian
ID: ELRA-S0413
ISLRN: 425-664-403-057-4Ahoslabi was built within the frame of the RESTORE project (“Restauración, almacenamiento y rehabilitación de la voz”) (restrictions apply) and has received funding from Spanish Ministry of Economy and Competitiveness with FEDER support (RESTORE project, TEC2015-67163- C2-1-R), the Basque Governm...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
700.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
700.00 €
|
- Spanish; Castilian
ID: ELRA-S0089
ISLRN: 443-392-902-600-9This corpus consists of 3 sub-corpora of 16 kHz 16 bits signals, recorded by 304 Castillian speakers. The 3 sub-corpora are: - Phonetic corpus: 6,800 utterances of phonetically balanced sentences, including 1000 with phonetic segmentation. - Geographic corpus: 6,800 utterances of sentences ext...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
1000.00 €
|
10000.00 €
|
Licence: Commercial Use - ELRA VAR |
10000.00 €
|
10000.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
2000.00 €
|
12000.00 €
|
Licence: Commercial Use - ELRA VAR |
12000.00 €
|
12000.00 €
|
Special offers are also available. Check here for details.
- German
ID: ELRA-S0299
ISLRN: 780-368-852-139-3ALC contains recordings of German speakers that are either intoxicated or sober. The type of speech ranges from read single digits to full conversation style. Recordings were done during drinking test where speakers drank beer or wine to reach a self-chosen level of alcoholic intoxication. The ac...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
510.00 €
|
510.00 €
|
Licence: Commercial Use - ELRA VAR |
510.00 €
|
510.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
1020.00 €
|
1020.00 €
|
Licence: Commercial Use - ELRA VAR |
1020.00 €
|
1020.00 €
|
- Arabic
ID: ELRA-W0030
ISLRN: 365-777-769-398-7The corpus was developed in the course of a research project at the University of Essex, in collaboration with the Open University. The corpus contains Al-Hayat newspaper articles with value added for Language Engineering and Information Retrieval applications development purposes. The data have ...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
480.00 €
|
960.00 €
|
Licence: Commercial Use - ELRA VAR |
960.00 €
|
960.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
720.00 €
|
1440.00 €
|
Licence: Commercial Use - ELRA VAR |
1440.00 €
|
1440.00 €
|
- French
ID: ELRA-S0486
ISLRN: 397-116-696-859-2The ALLIES Corpus was produced within the European CHIST-Era project ALLIES. The ALLIES project enabled to carry out a campaign for the evaluation of Broadcast News across time diarization systems using French data. This project is an extension of the previous ESTER, REPERE and ETAPE evaluation c...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
375.00 €
|
6250.00 €
|
Licence: Commercial Use - ELRA VAR |
25000.00 €
|
25000.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
2500.00 €
|
9375.00 €
|
Licence: Commercial Use - ELRA VAR |
31250.00 €
|
31250.00 €
|
- French
ID: ELRA-W0029
ISLRN: 786-395-313-491-8Launched at the end of 1995, the AMARYLLIS project aimed at evaluating information retrieval software for French text corpora in order to provide a methodology for the evaluation of other similar tools. AMARYLLIS was organised by the Institut de l'Information Scientifique et Technique (INIST) wit...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
45.00 €
|
100.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
45.00 €
|
100.00 €
|
- English
ID: ELRA-S0228-102
ISLRN: 992-319-311-431-0This corpus comprises 12,974 entries uttered by 30 speakers (15 males and 15 females), recorded over 2 channels (headset and mobile in noisy restaurant/shopping mall/info center/hospital/station/car). Speech samples are stored as a sequence of 16-bit 48kHz for a total of 12 hours of speech per ch...
MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
3600.00 €
|
3600.00 €
|
Licence: Commercial Use - ELRA VAR |
3600.00 €
|
3600.00 €
|
NON MEMBER | academic | commercial |
---|---|---|
Licence: Non Commercial Use - ELRA END USER |
3600.00 €
|
3600.00 €
|
Licence: Commercial Use - ELRA VAR |
3600.00 €
|
3600.00 €
|
« Previous | Next »