Search and Browse – ELRA Catalogue

Collins Multilingual database (MLD) - PhraseBank text

Arabic
Chinese
Croatian
Czech
Danish
Dutch; Flemish
English
Finnish
French
German
Hindi
Italian
Japanese
Korean
Modern Greek (1453-)
Norwegian
Persian
Polish
Portuguese
Russian
Spanish; Castilian
Swedish
Thai
Turkish
Vietnamese

ID: ELRA-T0377

The Collins Multilingual database covers Real Life Daily vocabulary. It is composed of a multilingual lexicon in 32 languages (the WordBank, distributed separately under reference ELRA-T0376) and a multilingual set of sentences in 28 languages (the PhraseBank). The PhraseBank consists of 2,000 p...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	1680.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	2240.00 €

Collins Multilingual database (MLD) – PhraseBank with audio files audio

Arabic
Chinese
Croatian
Czech
Danish
Dutch; Flemish
English
Finnish
French
German
Hindi
Italian
Japanese
Korean
Modern Greek (1453-)
Norwegian
Persian
Polish
Portuguese
Russian
Spanish; Castilian
Swedish
Thai
Turkish
Vietnamese

ID: ELRA-S0383

ISLRN: 398-655-047-044-5

The Collins Multilingual database covers Real Life Daily vocabulary. It is composed of a multilingual lexicon in 32 languages (the WordBank, see ELRA-T0376) and a multilingual set of sentences in 28 languages (the PhraseBank, see ELRA-T0377). This version includes the audio files corresponding t...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	3360.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	4480.00 €

Collins Multilingual database (MLD) - WordBank text

Arabic
Bengali
Chinese
Croatian
Czech
Danish
Dutch; Flemish
English
Finnish
French
German
Hindi
Italian
Japanese
Korean
Malayalam
Modern Greek (1453-)
Norwegian
Polish
Portuguese
Romanian; Moldavian; Moldovan
Russian
Spanish; Castilian
Swedish
Tamil
Thai
Turkish
Ukrainian
Vietnamese

ID: ELRA-T0376

ISLRN: 990-814-402-335-7

The Collins Multilingual database covers Real Life Daily vocabulary. It is composed of a multilingual lexicon in 32 languages (the WordBank) and a multilingual set of sentences in 28 languages (the PhraseBank, distributed separately under reference ELRA-T0377). The WordBank contains 10,000 words...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	2400.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	3600.00 €

GLOBAL Multilingual Lexical Data - Bilingual - Level 1 text

Arabic
Chinese
Czech
Danish
Dutch; Flemish
English
French
German
Hebrew
Hindi
Italian
Japanese
Korean
Latin
Modern Greek (1453-)
Norwegian
Polish
Portuguese
Russian
Spanish; Castilian
Swedish
Thai
Turkish

ID: ELRA-M0111-04

ISLRN: 255-971-767-096-3

The GLOBAL Multilingual Lexical Data (references ELRA-M0111-01 to ELRA-M0111-06 in the ELRA Catalogue) consists of a network of lexicographic cores for major world languages, comprising diverse monolingual, bilingual and multilingual combinations, in different sizes, originally built for language...

MEMBER	academic	commercial
Licence: Commercial Use - ELRA VAR	6800.00 €	6800.00 €

NON MEMBER	academic	commercial
Licence: Commercial Use - ELRA VAR	7140.00 €	7140.00 €

Special offers are also available. Check here for details.

GLOBAL Multilingual Lexical Data - Monolingual - Level 1 text

Arabic
Chinese
Czech
Danish
Dutch; Flemish
English
French
German
Hebrew
Hindi
Italian
Japanese
Korean
Latin
Modern Greek (1453-)
Norwegian
Polish
Portuguese
Russian
Spanish; Castilian
Swedish
Thai
Turkish

ID: ELRA-M0111-01

ISLRN: 604-974-454-390-3

The GLOBAL Multilingual Lexical Data (references ELRA-M0111-01 to ELRA-M0111-06 in the ELRA Catalogue) consists of a network of lexicographic cores for major world languages, comprising diverse monolingual, bilingual and multilingual combinations, in different sizes, originally built for language...

MEMBER	academic	commercial
Licence: Commercial Use - ELRA VAR	4250.00 €	4250.00 €

NON MEMBER	academic	commercial
Licence: Commercial Use - ELRA VAR	4462.50 €	4462.50 €

Special offers are also available. Check here for details.

Gram Vaani data set audio

Hindi

ID: ELRA-S0405

ISLRN: 045-205-425-611-4

The Gram Vaani data set consists of 130 hours (21,000 different audio recordings) recorded by 4,000 unique Hindi speakers from the states of Bihar, Jharkhand, and Madhya Pradesh in India (20-25% female, 60% people under 30 years of age, mostly rural). The data set was collected via a voice-bas...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	50000.00 €
Licence: Commercial Use - ELRA VAR	50000.00 €	50000.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	50000.00 €
Licence: Commercial Use - ELRA VAR	50000.00 €	50000.00 €

Hindi Speech Data by Mobile Phone - 759 Hours audio

Hindi

ID: ELRA-S0452

ISLRN: 942-490-066-841-8

The data is 759 hours long and was recorded by 1,425 Indian native speakers. The accent is authentic. The recording text is designed by language experts and covers general, interactive, car, home and other categories. The text is manually proofread, and the accuracy is high. Recording devices are...

MEMBER	academic	commercial
Licence: Commercial Use - ELRA VAR	115368.00 €	115368.00 €

NON MEMBER	academic	commercial
Licence: Commercial Use - ELRA VAR	115368.00 €	115368.00 €

Special offers are also available. Check here for details.

Hindi Speech Data by Mobile Phone_R - 240 Hours audio

Hindi

ID: ELRA-S0463

ISLRN: 037-729-898-638-1

The data is 240 hours and is recorded by 401 Indian. It is recorded in both quiet and noisy environment, which is more suitable for the actual application scenario. The recording content is rich, covering economic, entertainment, news, spoken language, etc. All texts are manually transcrits, with...

MEMBER	academic	commercial
Licence: Commercial Use - ELRA VAR	34200.00 €	34200.00 €

NON MEMBER	academic	commercial
Licence: Commercial Use - ELRA VAR	34200.00 €	34200.00 €

Special offers are also available. Check here for details.

Hindi Speech Recognition Corpus (Desktop) audio

Hindi

ID: ELRA-S0228-114

ISLRN: 198-341-627-529-5

This corpus was recorded in a quiet office environment over 4 channels and collected from a total of 196 speakers, including 95 males and 101 females, all of whom have been carefully screened to ensure their standard and clear pronunciation. The audio scripts cover information such as news and da...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	18000.00 €	18000.00 €
Licence: Commercial Use - ELRA VAR	18000.00 €	18000.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	18000.00 €	18000.00 €
Licence: Commercial Use - ELRA VAR	18000.00 €	18000.00 €

Hindi Speech Recognition Corpus (Mobile) audio

Hindi

ID: ELRA-S0228-125

ISLRN: 078-014-181-343-9

This corpus was recorded in both quiet and noisy environments over 3 channels and collected from a total of 180 speakers, including 99 males and 81 females, all of whom have been carefully screened to ensure their standard and clear pronunciation. The audio scripts cover information such as news....

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	16200.00 €	16200.00 €
Licence: Commercial Use - ELRA VAR	16200.00 €	16200.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	16200.00 €	16200.00 €
Licence: Commercial Use - ELRA VAR	16200.00 €	16200.00 €

MULTIGLOSS Multilingual Glossaries - L1-English pair text

Afrikaans
Arabic
Azerbaijani
Bulgarian
Catalan; Valencian
Chinese
Croatian
Czech
Danish
Dutch; Flemish
English
Estonian
Finnish
French
German
Hebrew
Hindi
Hungarian
Icelandic
Indonesian
Italian
Japanese
Korean
Latin
Latvian
Lithuanian
Malay (macrolanguage)
Modern Greek (1453-)
Norwegian
Persian
Polish
Portuguese
Romanian; Moldavian; Moldovan
Russian
Serbian
Slovak
Slovenian
Spanish; Castilian
Swedish
Thai
Turkish
Ukrainian
Urdu
Vietnamese
Western Frisian

ID: ELRA-M0112-01

ISLRN: 098-079-939-987-5

A series of innovative multilingual word-to-sense glossaries, based on a human-edited word-to-sense bilingual index of each language to English, which is linked automatically to the translation equivalents in 45 target languages. Each word and expression in every language is translated via its...

MEMBER	academic	commercial
Licence: Commercial Use - ELRA VAR	2500.00 €	2500.00 €

NON MEMBER	academic	commercial
Licence: Commercial Use - ELRA VAR	2625.00 €	2625.00 €

Special offers are also available. Check here for details.

MULTIGLOSS Multilingual Glossaries - L1-English pair + 1 language text

Afrikaans
Arabic
Azerbaijani
Bulgarian
Catalan; Valencian
Chinese
Croatian
Czech
Danish
Dutch; Flemish
English
Estonian
Finnish
French
German
Hebrew
Hindi
Hungarian
Icelandic
Indonesian
Italian
Japanese
Korean
Latin
Latvian
Lithuanian
Malay (macrolanguage)
Modern Greek (1453-)
Norwegian
Persian
Polish
Portuguese
Romanian; Moldavian; Moldovan
Russian
Serbian
Slovak
Slovenian
Spanish; Castilian
Swedish
Thai
Turkish
Ukrainian
Urdu
Vietnamese
Western Frisian

ID: ELRA-M0112-02

ISLRN: 610-290-284-705-6

A series of innovative multilingual word-to-sense glossaries, based on a human-edited word-to-sense bilingual index of each language to English, which is linked automatically to the translation equivalents in 45 target languages. Each word and expression in every language is translated via its...

MEMBER	academic	commercial
Licence: Commercial Use - ELRA VAR	3750.00 €	3750.00 €

NON MEMBER	academic	commercial
Licence: Commercial Use - ELRA VAR	3937.50 €	3937.50 €

Special offers are also available. Check here for details.

Parallel Corpora for 6 Indian Languages text

Bengali
English
Hindi
Malayalam
Tamil
Telugu
Urdu

ID: ELRA-W0320

ISLRN: 657-350-757-058-6

The Parallel Corpora for 6 Indian Languages contains data sets for Bengali (540,000 words – 20,000 parallel sentences), Hindi (1,200,000 words – 37 000 parallel sentences), Malayalam (660,000 words – 29,000 parallel sentences), Tamil (747,000 words – 35,000 parallel sentences), Telugu (951,000 wo...

MEMBER	academic	commercial
Licence: Attribution, Share Alike - CC-BY-SA-3.0	0.00 €	0.00 €

NON MEMBER	academic	commercial
Licence: Attribution, Share Alike - CC-BY-SA-3.0	0.00 €	0.00 €

The EMILLE/CIIL Corpus text

Assamese
Bengali
English
Gujarati
Hindi
Kannada
Kashmiri
Malayalam
Marathi
Oriya (macrolanguage)
Panjabi; Punjabi
Sinhala; Sinhalese
Tamil
Telugu
Urdu

ID: ELRA-W0037

ISLRN: 039-846-040-604-0

The EMILLE/CIIL Corpus consists of three components: monolingual, parallel and annotated corpora. There are fourteen monolingual corpora, including both written and (for some languages) spoken data for fourteen South Asian languages: Assamese, Bengali, Gujarati, Hindi, Kannada, Kashmiri, Malayala...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €

The EMILLE Lancaster Corpus text

Bengali
English
Gujarati
Hindi
Panjabi; Punjabi
Sinhala; Sinhalese
Tamil
Urdu

ID: ELRA-W0038

ISLRN: 438-045-014-925-0

The EMILLE Lancaster Corpus consists of three components: monolingual, parallel and annotated corpora. There are monolingual corpora for seven South Asian languages: Bengali, Gujarati, Hindi, Punjabi, Sinhala, Tamil, Urdu. The EMILLE monolingual corpora contain approximately 58,880,000 words (i...

MEMBER	academic	commercial
Licence: Commercial Use - ELRA VAR		7500.00 €

NON MEMBER	academic	commercial
Licence: Commercial Use - ELRA VAR		12000.00 €

Corpus:
Lexical/Conceptual:
Tool/Service:
Language Description:

Text:
Audio:
Image:
Video:
Text Numerical:
Text N-Gram:

Resource Type:

Media Type:

15 Language Resources