Resource Type:

Corpus:
Lexical/Conceptual:
Tool/Service:
Language Description:

Media Type:

Text:
Audio:
Image:
Video:
Text Numerical:
Text N-Gram:

4 Language Resources

Order by:

 Gram Vaani data set    
  • Hindi

ID: ELRA-S0405

ISLRN: 045-205-425-611-4

The Gram Vaani data set consists of 130 hours (21,000 different audio recordings) recorded by 4,000 unique Hindi speakers from the states of Bihar, Jharkhand, and Madhya Pradesh in India (20-25% female, 60% people under 30 years of age, mostly rural). The data set was collected via a voice-bas...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
50000.00 € submit
Licence: Commercial Use - ELRA VAR
50000.00 € submit
50000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
50000.00 € submit
Licence: Commercial Use - ELRA VAR
50000.00 € submit
50000.00 € submit
 Hindi Speech Data by Mobile Phone - 759 Hours    
  • Hindi

ID: ELRA-S0452

ISLRN: 942-490-066-841-8

The data is 759 hours long and was recorded by 1,425 Indian native speakers. The accent is authentic. The recording text is designed by language experts and covers general, interactive, car, home and other categories. The text is manually proofread, and the accuracy is high. Recording devices are...

MEMBERacademiccommercial
Licence: Commercial Use - ELRA VAR
115368.00 € submit
115368.00 € submit
NON MEMBERacademiccommercial
Licence: Commercial Use - ELRA VAR
115368.00 € submit
115368.00 € submit

Special offers are also available. Check here for details.

 Hindi Speech Data by Mobile Phone_R - 240 Hours    
  • Hindi

ID: ELRA-S0463

ISLRN: 037-729-898-638-1

The data is 240 hours and is recorded by 401 Indian. It is recorded in both quiet and noisy environment, which is more suitable for the actual application scenario. The recording content is rich, covering economic, entertainment, news, spoken language, etc. All texts are manually transcrits, with...

MEMBERacademiccommercial
Licence: Commercial Use - ELRA VAR
34200.00 € submit
34200.00 € submit
NON MEMBERacademiccommercial
Licence: Commercial Use - ELRA VAR
34200.00 € submit
34200.00 € submit

Special offers are also available. Check here for details.

 The EMILLE Lancaster Corpus    
  • Bengali
  • English
  • Gujarati
  • Hindi
  • Panjabi; Punjabi
  • Sinhala; Sinhalese
  • Tamil
  • Urdu

ID: ELRA-W0038

ISLRN: 438-045-014-925-0

The EMILLE Lancaster Corpus consists of three components: monolingual, parallel and annotated corpora. There are monolingual corpora for seven South Asian languages: Bengali, Gujarati, Hindi, Punjabi, Sinhala, Tamil, Urdu. The EMILLE monolingual corpora contain approximately 58,880,000 words (i...

MEMBERacademiccommercial
Licence: Commercial Use - ELRA VAR
7500.00 € submit
NON MEMBERacademiccommercial
Licence: Commercial Use - ELRA VAR
12000.00 € submit