11 Language Resources

Order by:

 2007 CoNLL Shared Task - Basque, Catalan, Czech & Turkish    
  • Basque
  • Catalan; Valencian
  • Czech
  • Turkish

ID: ELRA-W0121

ISLRN: 769-620-932-723-2

2007 CoNLL Shared Task - Basque, Catalan, Czech & Turkish consists of dependency treebanks in four languages used as part of the CoNLL 2007 shared task on multi-lingual dependency parsing and domain adaptation. The languages covered in this release are: Basque, Catalan, Czech and Turkish. The ...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
0.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
0.00 € submit
 Collins Multilingual database (MLD) – PhraseBank with audio files    
  • Arabic
  • Chinese
  • Croatian
  • Czech
  • Danish
  • Dutch; Flemish
  • English
  • Finnish
  • French
  • German
  • Hindi
  • Italian
  • Japanese
  • Korean
  • Modern Greek (1453-)
  • Norwegian
  • Persian
  • Polish
  • Portuguese
  • Russian
  • Spanish; Castilian
  • Swedish
  • Thai
  • Turkish
  • Vietnamese

ID: ELRA-S0383

ISLRN: 398-655-047-044-5

The Collins Multilingual database covers Real Life Daily vocabulary. It is composed of a multilingual lexicon in 32 languages (the WordBank, see ELRA-T0376) and a multilingual set of sentences in 28 languages (the PhraseBank, see ELRA-T0377). This version includes the audio files corresponding t...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
3360.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
4480.00 € submit
 Collins Multilingual database (MLD) – WordBank with audio files    
  • Arabic
  • Chinese
  • Croatian
  • Czech
  • Danish
  • Dutch; Flemish
  • English
  • Finnish
  • French
  • German
  • Italian
  • Japanese
  • Korean
  • Modern Greek (1453-)
  • Norwegian
  • Polish
  • Portuguese
  • Russian
  • Spanish; Castilian
  • Swedish
  • Thai
  • Turkish
  • Vietnamese

ID: ELRA-S0382

ISLRN: 309-438-781-042-2

The Collins Multilingual database covers Real Life Daily vocabulary. It is composed of a multilingual lexicon in 32 languages (the WordBank, see ELRA-T0376) and a multilingual set of sentences in 28 languages (the PhraseBank, see ELRA-T0377). This version includes the corresponding audio files c...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
3640.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
5200.00 € submit
 Czech Audio-Visual Speech Corpus for Recognition with Impaired Conditions    
  • Czech

ID: ELRA-S0284

ISLRN: 747-828-662-077-7

This is an audio-visual speech database for training and testing of Czech audio-visual continuous speech recognition systems collected with impaired illumination conditions. The corpus consists of about 20 hours of audio-visual records of 50 speakers in laboratory conditions. Recorded subjects we...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
650.00 € submit
650.00 € submit
Licence: Commercial Use - ELRA VAR
3050.00 € submit
3050.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
1250.00 € submit
1250.00 € submit
Licence: Commercial Use - ELRA VAR
4550.00 € submit
4550.00 € submit
 Czech SpeechDat(E) Database    
  • Czech

ID: ELRA-S0094

ISLRN: 891-889-899-078-7

The Czech SpeechDat(E) Database (Eastern European Speech Databases for Creation of Voice Driven Teleservices) comprises 1052 Czech speakers (526 males, 526 females) recorded over the Czech fixed telephone network. This database is partitioned into 6 CDs. The speech databases made within the Speec...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
10000.00 € submit
16000.00 € submit
Licence: Commercial Use - ELRA VAR
16000.00 € submit
16000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
10000.00 € submit
16000.00 € submit
Licence: Commercial Use - ELRA VAR
16000.00 € submit
16000.00 € submit

Special offers are also available. Check here for details.

 Czech Speecon database    
  • Czech

ID: ELRA-S0298

ISLRN: 897-416-018-798-6

The Czech Speecon database is divided into 2 sets: 1) The first set comprises the recordings of 550 adult Czech speakers (275 males, 275 females), recorded over 4 microphone channels in 4 recording environments (office, entertainment, car, public place). 2) The second set comprises the record...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
Licence: Commercial Use - ELRA VAR
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
Licence: Commercial Use - ELRA VAR
 ECI/MCI (European Corpus Initiative/Multilingual Corpus I)    
  • Albanian
  • Bulgarian
  • Chinese
  • Czech
  • Danish
  • Dutch; Flemish
  • English
  • Estonian
  • French
  • German
  • Italian
  • Japanese
  • Latin
  • Lithuanian
  • Malay (macrolanguage)
  • Modern Greek (1453-)
  • Norwegian
  • Portuguese
  • Russian
  • Scottish Gaelic; Gaelic
  • Serbian
  • Spanish; Castilian
  • Swedish
  • Turkish
  • Uzbek

ID: ELRA-W0004

ISLRN: 511-168-567-582-5

The European Corpus Initiative (ECI) was founded to oversee the acquisition and preparation of a large multilingual corpus, and supports existing and projected national and international efforts to carefully design, collect and publish large-scale multilingual written and spoken corpora. ECI has ...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
50.00 € submit
50.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
50.00 € submit
50.00 € submit
 GlobalPhone 2000 Speaker Package    
  • Arabic
  • Bulgarian
  • Chinese
  • Croatian
  • Czech
  • French
  • German
  • Hausa
  • Japanese
  • Korean
  • Polish
  • Portuguese
  • Russian
  • Spanish; Castilian
  • Swahili (macrolanguage)
  • Swedish
  • Tamil
  • Thai
  • Turkish
  • Ukrainian
  • Vietnamese

ID: ELRA-S0400

ISLRN: 331-592-378-424-7

The GlobalPhone 2000 Speaker Package contains transcribed read speech spoken by 2000 native speakers in 22 languages. The data are sampled from the GlobalPhone Speech and Text Data available in the ELRA Catalogue, i.e.: Arabic (ELRA-S0192), Bulgarian (ELRA-S0319), Chinese-Mandarin (ELRA-S0193), C...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
1200.00 € submit
6000.00 € submit
Licence: Commercial Use - ELRA VAR
6000.00 € submit
6000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
1400.00 € submit
7200.00 € submit
Licence: Commercial Use - ELRA VAR
7200.00 € submit
7200.00 € submit
 GlobalPhone Czech    
  • Czech

ID: ELRA-S0196

ISLRN: 852-715-156-961-1

The GlobalPhone corpus developed in collaboration with the Karlsruhe Institute of Technology (KIT) was designed to provide read speech data for the development and evaluation of large continuous speech recognition systems in the most widespread languages of the world, and to provide a uniform, mu...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
600.00 € submit
3000.00 € submit
Licence: Commercial Use - ELRA VAR
3000.00 € submit
3000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
700.00 € submit
3600.00 € submit
Licence: Commercial Use - ELRA VAR
3600.00 € submit
3600.00 € submit

Special offers are also available. Check here for details.

 GlobalPhone Multilingual Model Package    
  • Arabic
  • Bulgarian
  • Chinese
  • Croatian
  • Czech
  • French
  • German
  • Hausa
  • Japanese
  • Korean
  • Polish
  • Portuguese
  • Russian
  • Spanish; Castilian
  • Swahili (macrolanguage)
  • Swedish
  • Tamil
  • Thai
  • Turkish
  • Ukrainian
  • Vietnamese

ID: ELRA-S0399

ISLRN: 204-945-263-927-6

The GlobalPhone Multilingual Model Package contains about 22 hours of transcribed read speech spoken by native speakers in 22 languages. The data are sampled from the GlobalPhone Speech and Text Data available in the ELRA Catalogue, i.e.: Arabic (ELRA-S0192), Bulgarian (ELRA-S0319), Chinese-Manda...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
1200.00 € submit
6000.00 € submit
Licence: Commercial Use - ELRA VAR
6000.00 € submit
6000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
1400.00 € submit
7200.00 € submit
Licence: Commercial Use - ELRA VAR
7200.00 € submit
7200.00 € submit
 Laboratory Conditions Czech Audio-Visual Speech Corpus    
  • Czech

ID: ELRA-S0283

ISLRN: 576-231-698-778-0

This is an audio-visual speech database for training and testing of Czech audio-visual continuous speech recognition systems. The corpus consists of about 25 hours of audio-visual records of 65 speakers in laboratory conditions. Data collection was done with static illumination, and recorded subj...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
550.00 € submit
550.00 € submit
Licence: Commercial Use - ELRA VAR
2050.00 € submit
2050.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
1050.00 € submit
1050.00 € submit
Licence: Commercial Use - ELRA VAR
3050.00 € submit
3050.00 € submit