20 Language Resources

Order by:

 2006 CoNLL Shared Task - Ten Languages    
  • Bulgarian
  • Danish
  • Dutch; Flemish
  • German
  • Japanese
  • Portuguese
  • Slovenian
  • Spanish; Castilian
  • Swedish
  • Turkish

ID: ELRA-W0086

ISLRN: 578-227-532-044-0

2006 CoNLL Shared Task - Ten Languages consists of dependency treebanks in ten languages used as part of the CoNLL 2006 shared task on multi-lingual dependency parsing. The languages covered in this release are: Bulgarian, Danish, Dutch, German, Japanese, Portuguese, Slovene, Spanish, Swedish and...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
0.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
0.00 € submit
 Collins Multilingual database (MLD) - PhraseBank    
  • Arabic
  • Chinese
  • Croatian
  • Czech
  • Danish
  • Dutch; Flemish
  • English
  • Finnish
  • French
  • German
  • Hindi
  • Italian
  • Japanese
  • Korean
  • Modern Greek (1453-)
  • Norwegian
  • Persian
  • Polish
  • Portuguese
  • Russian
  • Spanish; Castilian
  • Swedish
  • Thai
  • Turkish
  • Vietnamese

ID: ELRA-T0377

ISLRN: 452-383-219-228-0

The Collins Multilingual database covers Real Life Daily vocabulary. It is composed of a multilingual lexicon in 32 languages (the WordBank, distributed separately under reference ELRA-T0376) and a multilingual set of sentences in 28 languages (the PhraseBank). The PhraseBank consists of 2,000 p...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
1680.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
2240.00 € submit
 Collins Multilingual database (MLD) – PhraseBank with audio files    
  • Arabic
  • Chinese
  • Croatian
  • Czech
  • Danish
  • Dutch; Flemish
  • English
  • Finnish
  • French
  • German
  • Hindi
  • Italian
  • Japanese
  • Korean
  • Modern Greek (1453-)
  • Norwegian
  • Persian
  • Polish
  • Portuguese
  • Russian
  • Spanish; Castilian
  • Swedish
  • Thai
  • Turkish
  • Vietnamese

ID: ELRA-S0383

ISLRN: 398-655-047-044-5

The Collins Multilingual database covers Real Life Daily vocabulary. It is composed of a multilingual lexicon in 32 languages (the WordBank, see ELRA-T0376) and a multilingual set of sentences in 28 languages (the PhraseBank, see ELRA-T0377). This version includes the audio files corresponding t...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
3360.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
4480.00 € submit
 Collins Multilingual database (MLD) - WordBank    
  • Arabic
  • Bengali
  • Chinese
  • Croatian
  • Czech
  • Danish
  • Dutch; Flemish
  • English
  • Finnish
  • French
  • German
  • Hindi
  • Italian
  • Japanese
  • Korean
  • Malayalam
  • Modern Greek (1453-)
  • Norwegian
  • Polish
  • Portuguese
  • Romanian; Moldavian; Moldovan
  • Russian
  • Spanish; Castilian
  • Swedish
  • Tamil
  • Thai
  • Turkish
  • Ukrainian
  • Vietnamese

ID: ELRA-T0376

ISLRN: 990-814-402-335-7

The Collins Multilingual database covers Real Life Daily vocabulary. It is composed of a multilingual lexicon in 32 languages (the WordBank) and a multilingual set of sentences in 28 languages (the PhraseBank, distributed separately under reference ELRA-T0377). The WordBank contains 10,000 words...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
2400.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
3600.00 € submit
 Collins Multilingual database (MLD) – WordBank with audio files    
  • Arabic
  • Chinese
  • Croatian
  • Czech
  • Danish
  • Dutch; Flemish
  • English
  • Finnish
  • French
  • German
  • Italian
  • Japanese
  • Korean
  • Modern Greek (1453-)
  • Norwegian
  • Polish
  • Portuguese
  • Russian
  • Spanish; Castilian
  • Swedish
  • Thai
  • Turkish
  • Vietnamese

ID: ELRA-S0382

ISLRN: 309-438-781-042-2

The Collins Multilingual database covers Real Life Daily vocabulary. It is composed of a multilingual lexicon in 32 languages (the WordBank, see ELRA-T0376) and a multilingual set of sentences in 28 languages (the PhraseBank, see ELRA-T0377). This version includes the corresponding audio files c...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
3640.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
5200.00 € submit
 ECI/MCI (European Corpus Initiative/Multilingual Corpus I)    
  • Albanian
  • Bulgarian
  • Chinese
  • Czech
  • Danish
  • Dutch; Flemish
  • English
  • Estonian
  • French
  • German
  • Italian
  • Japanese
  • Latin
  • Lithuanian
  • Malay (macrolanguage)
  • Modern Greek (1453-)
  • Norwegian
  • Portuguese
  • Russian
  • Scottish Gaelic; Gaelic
  • Serbian
  • Spanish; Castilian
  • Swedish
  • Turkish
  • Uzbek

ID: ELRA-W0004

ISLRN: 511-168-567-582-5

The European Corpus Initiative (ECI) was founded to oversee the acquisition and preparation of a large multilingual corpus, and supports existing and projected national and international efforts to carefully design, collect and publish large-scale multilingual written and spoken corpora. ECI has ...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
50.00 € submit
50.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
50.00 € submit
50.00 € submit
 FASiL combined unimodal “fasil-all” corpus    
  • English
  • Portuguese
  • Swedish

ID: ELRA-S0174-04

ISLRN: 965-898-210-870-9

The corpus was collected in the context of the FASiL project, EU FP5 IST-2001-38685 (http://www.fasil.co.uk), as a wizard-of-oz experiment. Therefore, there are sound recordings of subject and wizard. A total of 210 subjects were recorded in the three project languages Swedish, Portuguese and Eng...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
10000.00 € submit
25000.00 € submit
Licence: Commercial Use - ELRA VAR
25000.00 € submit
25000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
20000.00 € submit
30000.00 € submit
Licence: Commercial Use - ELRA VAR
30000.00 € submit
30000.00 € submit
 FASiL multimodal “fasil-mm” corpus    
  • English
  • Portuguese
  • Swedish

ID: ELRA-S0174-05

ISLRN: 377-306-017-564-7

The corpus was collected in the context of the FASiL project, EU FP5 IST-2001-38685 (http://www.fasil.co.uk), as a wizard-of-oz experiment. Therefore, there are sound and interaction recordings of subject and wizard. A total of 90 subjects were recorded (30 per language: English, Portuguese and S...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
9000.00 € submit
30000.00 € submit
Licence: Commercial Use - ELRA VAR
30000.00 € submit
30000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
20000.00 € submit
30000.00 € submit
Licence: Commercial Use - ELRA VAR
30000.00 € submit
30000.00 € submit
 FASiL Swedish unimodal “fasil-sv” corpus    
  • Swedish

ID: ELRA-S0174-03

ISLRN: 082-283-470-243-3

The corpus was collected in the context of the FASiL project, EU FP5 IST-2001-38685 (http://www.fasil.co.uk), as a wizard-of-oz experiment. Therefore, there are sound recordings of subject and wizard. A total of 70 subjects were recorded. The corpus is formatted as .wav files (u-law) for audio,...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
4000.00 € submit
8000.00 € submit
Licence: Commercial Use - ELRA VAR
8000.00 € submit
8000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
8000.00 € submit
10000.00 € submit
Licence: Commercial Use - ELRA VAR
10000.00 € submit
10000.00 € submit
 Finnish-Swedish Speechdat(II) FDB-1000    
  • Swedish

ID: ELRA-S0080

ISLRN: 852-696-240-092-0

The Finnish-Swedish SpeechDat(II) FDB-1000 contains the recordings of 1,000 Finnish speakers (455 males, 545 females) recorded over the Finnish fixed telephone network. The FDB-1000 database is partitioned into 4 CDs, the first 3 CDs comprise 300 speakers sessions, the 4th comprises 100 speakers....

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
9000.00 € submit
18000.00 € submit
Licence: Commercial Use - ELRA VAR
18000.00 € submit
18000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
22000.00 € submit
25000.00 € submit
Licence: Commercial Use - ELRA VAR
25000.00 € submit
25000.00 € submit
 GlobalPhone 2000 Speaker Package    
  • Arabic
  • Bulgarian
  • Chinese
  • Croatian
  • Czech
  • French
  • German
  • Hausa
  • Japanese
  • Korean
  • Polish
  • Portuguese
  • Russian
  • Spanish; Castilian
  • Swahili (macrolanguage)
  • Swedish
  • Tamil
  • Thai
  • Turkish
  • Ukrainian
  • Vietnamese

ID: ELRA-S0400

ISLRN: 331-592-378-424-7

The GlobalPhone 2000 Speaker Package contains transcribed read speech spoken by 2000 native speakers in 22 languages. The data are sampled from the GlobalPhone Speech and Text Data available in the ELRA Catalogue, i.e.: Arabic (ELRA-S0192), Bulgarian (ELRA-S0319), Chinese-Mandarin (ELRA-S0193), C...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
1200.00 € submit
6000.00 € submit
Licence: Commercial Use - ELRA VAR
6000.00 € submit
6000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
1400.00 € submit
7200.00 € submit
Licence: Commercial Use - ELRA VAR
7200.00 € submit
7200.00 € submit
 GlobalPhone Multilingual Model Package    
  • Arabic
  • Bulgarian
  • Chinese
  • Croatian
  • Czech
  • French
  • German
  • Hausa
  • Japanese
  • Korean
  • Polish
  • Portuguese
  • Russian
  • Spanish; Castilian
  • Swahili (macrolanguage)
  • Swedish
  • Tamil
  • Thai
  • Turkish
  • Ukrainian
  • Vietnamese

ID: ELRA-S0399

ISLRN: 204-945-263-927-6

The GlobalPhone Multilingual Model Package contains about 22 hours of transcribed read speech spoken by native speakers in 22 languages. The data are sampled from the GlobalPhone Speech and Text Data available in the ELRA Catalogue, i.e.: Arabic (ELRA-S0192), Bulgarian (ELRA-S0319), Chinese-Manda...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
1200.00 € submit
6000.00 € submit
Licence: Commercial Use - ELRA VAR
6000.00 € submit
6000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
1400.00 € submit
7200.00 € submit
Licence: Commercial Use - ELRA VAR
7200.00 € submit
7200.00 € submit
 GlobalPhone Swedish    
  • Swedish

ID: ELRA-S0204

ISLRN: 250-851-338-892-4

The GlobalPhone corpus developed in collaboration with the Karlsruhe Institute of Technology (KIT) was designed to provide read speech data for the development and evaluation of large continuous speech recognition systems in the most widespread languages of the world, and to provide a uniform, mu...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
600.00 € submit
3000.00 € submit
Licence: Commercial Use - ELRA VAR
3000.00 € submit
3000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
700.00 € submit
3600.00 € submit
Licence: Commercial Use - ELRA VAR
3600.00 € submit
3600.00 € submit

Special offers are also available. Check here for details.

 GlobalPhone Swedish Pronunciation Dictionary      
  • Swedish

ID: ELRA-S0356

ISLRN: 806-342-462-116-9

The GlobalPhone pronunciation dictionaries, created within the framework of the multilingual speech and language corpus GlobalPhone, were developed in collaboration with the Karlsruhe Institute of Technology (KIT). The GlobalPhone pronunciation dictionaries contain the pronunciations of all wo...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
600.00 € submit
3000.00 € submit
Licence: Commercial Use - ELRA VAR
3000.00 € submit
3000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
700.00 € submit
3600.00 € submit
Licence: Commercial Use - ELRA VAR
3600.00 € submit
3600.00 € submit

Special offers are also available. Check here for details.

 PRESS 65    
  • Swedish

ID: ELRA-W0010

ISLRN: 860-303-374-818-4

Språkdata has made available the first of its many Swedish corpora, PRESS 65. It consists of one million running words taken from Swedish newspapers from the year 1965. It has been categorised according to text type and is annotated down to the sentence level.

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
12000.00 € submit
12000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
20000.00 € submit
20000.00 € submit
 Swedish EUROM1    
  • Swedish

ID: ELRA-S0276

ISLRN: 427-909-145-558-8

EUROM1 is the first really multilingual speech database produced in Europe. Equivalent corpora for each of the European languages were collected with the same number of speakers selected in the same way, and recorded in the same conditions with common file formats. Initially eight European countr...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
800.00 € submit
800.00 € submit
Licence: Commercial Use - ELRA VAR
800.00 € submit
800.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
1600.00 € submit
1600.00 € submit
Licence: Commercial Use - ELRA VAR
1600.00 € submit
1600.00 € submit
 Swedish SpeechDat(II) FDB-1000    
  • Swedish

ID: ELRA-S0070

ISLRN: 802-846-874-066-8

The Swedish SpeechDat(II) FDB-1000 contains the recordings of 1,000 Swedish speakers recorded over the Swedish fixed telephone network. This database is partitioned into 4 CDs, which comprise 250 speakers sessions each. It was validated by SPEX (the Netherlands) to assess its compliance with the ...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
9000.00 € submit
18000.00 € submit
Licence: Commercial Use - ELRA VAR
18000.00 € submit
18000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
22000.00 € submit
25000.00 € submit
Licence: Commercial Use - ELRA VAR
25000.00 € submit
25000.00 € submit
 Swedish SpeechDat(II) FDB-5000    
  • Swedish

ID: ELRA-S0069

ISLRN: 450-791-507-492-1

The Swedish SpeechDat(II) FDB-5000 database contains the recordings of 5,000 Swedish speakers (2470 males, 2530 females), recorded over the Swedish fixed telephone network. This database is partitioned into 25 CDs, which comprise 200 speakers sessions each. This speech database was validated by ...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
35000.00 € submit
50000.00 € submit
Licence: Commercial Use - ELRA VAR
50000.00 € submit
50000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
60000.00 € submit
70000.00 € submit
Licence: Commercial Use - ELRA VAR
70000.00 € submit
70000.00 € submit
 Swedish SpeechDat(II) MDB-1000    
  • Swedish

ID: ELRA-S0071

ISLRN: 085-122-595-310-4

The Swedish SpeechDat(II) MDB-1000 database contains the recordings of 1,000 Swedish speakers recorded over the Swedish mobile telephone network. This database is partitioned into 5 CDs, which comprise 200 speakers sessions each. This speech database was validated by SPEX (the Netherlands) to as...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
22000.00 € submit
25000.00 € submit
Licence: Commercial Use - ELRA VAR
25000.00 € submit
25000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
25000.00 € submit
35000.00 € submit
Licence: Commercial Use - ELRA VAR
35000.00 € submit
35000.00 € submit
 Swedish Speecon database    
  • Swedish

ID: ELRA-S0214

ISLRN: 219-653-018-216-3

The Swedish Speecon database is divided into 2 sets: 1) The first set comprises the recordings of 550 adult Swedish speakers (270 males, 280 females), recorded over 4 microphone channels in 4 recording environments (office, entertainment, car, public place). 2) The second set comprises the reco...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
50000.00 € submit
67000.00 € submit
Licence: Commercial Use - ELRA VAR
67000.00 € submit
67000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
60000.00 € submit
75000.00 € submit
Licence: Commercial Use - ELRA VAR
75000.00 € submit
75000.00 € submit