1,492 language resources at your disposal
An increasing number of LRs in the various fields of Human Language Technology (see image on the left-hand side) are distributed on behalf of ELRA via its operational body ELDA, thanks to the contribution of various players of the HLT community.
Our aim is to provide Language Resources, by means of this repository, so as to prevent researchers and developers from investing efforts to rebuild resources which already exist as well as help them identify and access those resources.
Latest Resources
Learner Corpus of Portuguese L2 – COPLE2
The Learner Corpus of Portuguese as Second/Foreign Language (COPLE2) is a corpus of written and oral texts produced by students of Portuguese as Foreign/Second Language courses in the Instituto de Cultura e Língua Portuguesa (the Institute of Portuguese Language and Culture) (ICLP – FLUL) and by applicants for examinations in ...
German Political Speeches Corpus
This corpus consists of a collection of political speeches in German crawled from the online archive of the German presidency (Bundespraësident) and the Chancellery (Bundesregierung). For the German Presidency the speeches are available from July 1, 1984 to February 17, 2012 and the corpus contains a total of 1,442 texts ...
ATCO2 Project Data
ATCO2 project aims at developing a unique platform allowing to collect, organize and pre-process air-traffic control (voice communication) data from air space. This project has received funding from the Clean Sky 2 Joint Undertaking (JU) under grant agreement No 864702. The JU receives support from the European Union’s Horizon 2020 ...
Russian Speech Data by Mobile Phone - 1,002 Hours
1960 Russian native speakers participated in the recording with authentic accent. The recorded script is designed by linguists and cover a wide range of topics including generic, interactive, in-vehicle and home. The text is manually proofread with high accuracy. It matches with mainstream Android and Apple system phones. Format:16kHz, 16bit, ...
Russian Speaking English Speech Data by Mobile Phone - 230 Hours
This dataset is recorded by 498 native Russian speakers with a balanced gender. It is rich in content and it covers generic command and control, human-machine interaction, smart home command and control, in-car command and control categories. The transcription corpus has been manually proofread to ensure high accuracy. Format:16kHz, 16bit, ...
Cantonese Dialect Speech Data by Mobile Phone - 1,652 Hours
It collects 4,888 speakers from Guangdong Province and is recorded in quiet indoor environment. The recorded content covers 500,000 commonly used spoken sentences, including high-frequency words in weico and daily used expressions. The average number of repetitions is 1.5 and the average sentence length is 12.5 words. Recording devices are ...
German Speech Data by Mobile Phone_Reading - 211 Hours
The data set contains 327 German native speakers' speech data. The recording contents include economics, entertainment, news, oral, figure, letter, etc. Each sentence contains 10.3 words on average. Each sentence is repeated 1.4 times on average. All texts are manually transcribed to ensure the high accuracy. Format:16kHz, 16bit, uncompressed wav, ...
Japanese Speech Data by Mobile Phone_R - 234 Hours
It collects 799 Japanese locals and is recorded in quiet indoor places, streets, restaurant. The recording includes 210,000 commonly used written and spoken Japanese sentences. The error rate of text transfer sentence is less than 5%. Recording devices are mainstream Android phones and iPhones. Format:16kHz 16bit, uncompressed wav, mono channel ...
Brazilian Portuguese Speech Data by Mobile Phone - 1,044 Hours
The data volumn is 1044 hours and is recorded by 2038 Brazilian native speakers. The recording text is designed by linguistic experts, which covers general interactive, in-car and home category. The texts are manually proofread with high accuracy. Recording devices are mainstream Android phones and iPhones. Format:16kHz, 16bit, uncompressed wav, ...
American Children Speech Data by Microphone - 50 Hours
It is recorded by 219 American children native speakers. The recording texts are mainly storybook, children's song, spoken expressions, etc. 350 sentences for each speaker. Each sentence contain 4.5 words in average. Each sentence is repeated 2.1 times in average. The recording device is hi-fi Blueyeti microphone. The texts are ...
Singaporean Speaking English Speech Data by Mobile Phone - 201 Hours
This dataset is recorded by 452 native Singaporean speakers with a balanced gender. It is rich in content and it covers generic command and control, human-machine interaction, smart home command and control, in-car command and control categories. The transcription corpus has been manually proofread to ensure high accuracy. Format:16kHz, 16bit, ...
Korean Speech Data by Mobile Phone_Reading - 197 Hours
It collects 291 Korean locals and is recorded in quiet indoor environment. The recordings include economics, entertainment, news, oral, figure, letter. 400 sentences for each speaker. Recording devices are mainstream Android phones and iPhones. Format:16kHz, 16bit, uncompressed wav, mono channel Recording environment:quiet indoor environment, without echo Recording content:economy, entertainment, news, ...
Mandarin Heavy Accent Speech Data by Mobile Phone - 662 Hours
It collects 2,034 local Chinese from 26 provinces like Henan, Shanxi, Sichuan, Hunan, Fujian, etc. It is mandarin speech data with heavy accent. The recording contents are finance and economics, entertainment, policy, news, TV, and movies. Format:16kHz, 16bit, uncompressed wav, mono channel. Recording environment:1,288 people complete the recording in relatively ...
Indonesian Speech Data by Mobile Phone - 639 Hours
1285 Indonesian native speakers participated in the recording with authentic accent. The recorded script is designed by linguists and cover a wide range of topics including generic, interactive, on-board and home. The text is manually proofread with high accuracy. It matches with mainstream Android and Apple system phones. The data ...
Chinese Digital Speech Data by Mobile Phone - 11,010 People
11,010 Chinese native speakers participated in the recording with equal gender. Each speaker reads 30 sentences of 4 -8 digit number. Format:16kHz, 16bit, uncompressed wav, mono channel Recording environment:quiet indoor environment, without echo Recording content (read speech):four to eight digital string Speaker:11,010 Chinese, 58% of which are female Device:Android mobile ...
Mixed Speech with Chinese and English Data by Mobile Phone - 1,535 Hours
The data is recorded by 3972 Chinese native speakers with accents covering seven major dialect areas. The recorded text is a mixture of Chinese and English sentences, covering general scenes and human-computer interaction scenes. It is rich in content and accurate in transcription. It can be used for improving the ...
Mandarin Strong Accent Speech Data by Mobile Phone - 1,025 Hours
More than 2,000 Chinese native speakers participated in the recording with equal gender. Speakers are mainly from the southern China, and some of them are from the provinces of northern China with Strong accents. The recording content is rich, covering mobile phone voice assistant interaction, smart home command and control, ...
Wuhan Dialect Speech Data by Mobile Phone - 997 Hours
Mobile phone captured audio data of Wuhan dialect, 997 hours in total, recorded by more than 2,000 Wuhan dialect native speakers. The recorded text covers generic, interactive, on-board, home and other categories, with rich contents. Wuhan locals participate in quality check and proofreading. Sentence accuracy rate reaches 95 %; this ...
Latin American Speaking English Speech Data by Mobile Phone - 117 Hours
281 Latin American people recorded in a relatively quiet environment in authentic English. The recorded script is designed by linguists and covers a wide range of topics including generic, interactive, on-board and home. The text is manually proofread with high accuracy. It matches with mainstream Android and Apple system phones. ...
Indonesian Speech Data by Mobile Phone_R - 359 Hours
Indonesia speech data (reading) is collected from 496 Indonesian native speakers and is recorded in quiet environment. The recording is rich in content, covering multiple categories such as econimics, entertainment, news, figure, letter, and oral. Around 400 sentences for each speaker. The valid data volumn is 360 hours. All texts ...
Mandarin Mobile Telephony Conversational Speech Collection Data - 2,657 Hours
4491 speakers participated in the recording and conducted face-to-face communication in a natural way. No topics are specified, with a wide range of fields; the voice was natural and fluent, in line with the actual dialogue scene. Text is transferred manually, with high accuracy. Format:16kHz, 16bit, uncompressed wav, mono channel ...
Japanese Speech Data by Mobile Phone - 261 Hours
1006 Japanese native speakers participated in the recording, coming from eastern, western, and Kyushu regions, while the eastern region accounting for the largest proportion. The recording content is rich and all texts have been manually transcribed with high accuracy. Format:16kHz, 16bit, uncompressed wav, mono channel Recording environment:quiet indoor environment, without ...
Japanese Speaking English Speech Data by Mobile Phone - 207 Hours
400 native Japanese speakers involved, balanced for gender. The recording corpus is rich in content, and it covers a wide domain such as generic command and control category, human-machine interaction category, smart home category, in-car category. The transcription corpus has been manually proofread to ensure high accuracy. Format:16kHz, 16bit, uncompressed ...
Chinese Children Speech data by Mobile phone - 3,255 Hours
Mobile phone captured audio data of Chinese children, with total duration of 3,255 hours. 9,780 speakers are children aged 6 to 12, with accent covering seven dialect areas; the recorded text contains common children languages such as essay stories, numbers, and their interactions on cars, at home, and with voice ...
Shanghai Dialect Speech Data by Mobile Phone - 1,030 Hours
It collects 2.956 speakers from Shanghai and is recorded in quiet indoor environment. The recorded content includes multi-domain customer consultation, short messages, numbers, Shanghai POI, etc. The corpus has no repetition and the average sentence length is 12.68 words. Recording devices are mainstream Android phones and iPhones. Format:16kHz, 16bit, uncompressed ...
Chinese Speaking English Speech Data by Mobile phone - 593 Hours
This dataset is 100,000 colloquial English sentences recorded by 3,691 Chinese, covering many domestic dialect zones like Jiangsu, Shandong, Beijing, Henan, and meets the specific accent of Chinese speaking English. The recording texts contain commonly used sentences with rich contents, broad fields, and balanced phoneme. It can be used in ...
Hindi Speech Data by Mobile Phone - 759 Hours
The data is 759 hours long and was recorded by 1,425 Indian native speakers. The accent is authentic. The recording text is designed by language experts and covers general, interactive, car, home and other categories. The text is manually proofread, and the accuracy is high. Recording devices are mainstream Android ...
Sichuan Dialect Conversational Speech Data by Mobile Phone - 800 Hours
1730 Sichuan native speakers participated in the recording and face-to-face free talking in a natural way in wide fields without the topic specified. It is natural and fluency in speech, and in line with the actual dialogue scene. The speech was transcribed into text manually to ensure high accuracy. Format:16kHz, ...
Chinese Speaking English Speech Data by Mobile Phone - 502 Hours
1,279 Chinese speakers from major dialect regions participated in the recording. It is in line with the specific accent of Chinese English speakers. The recorded script cover many categories such as spoken English, speech, and human-computer interaction, rich in content, extensive in fields, and balanced in phonemes. It can be ...
French Speech Data by Mobile Phone_Reading - 231 Hours
The data volume is 231 hours and is recorded by 406 speakers (from French, Canada, and Africa). The recording is in quiet environment and rich in content. It contains various fields like economics, entertainment, news, and spoken language. All texts are manually transcribed. The sentence accuracy rate is 95%. Format:16kHz, ...
Japanese Speech Data By Mobile Phone - 474 Hours
The data were recorded by 1,245 native Japanese speakers. The recorded content covers a wide range of categories such as general purpose, interactive, in car commands, home commands, etc. The recorded text is designed by a language expert, and the text is manually proofread with high accuracy. Match mainstream Android, ...
Malay Speech Data by Mobile Phone - 370 Hours
675 Malaysians native speakers participated in the recording with authentic accent. The recorded script is designed by linguists and cover a wide range of topics including generic, interactive, on-board and home. The text is manually proofread with high accuracy. It matches with mainstream Android and Apple system phones. The data ...
Hindi Speech Data by Mobile Phone_R - 240 Hours
The data is 240 hours and is recorded by 401 Indian. It is recorded in both quiet and noisy environment, which is more suitable for the actual application scenario. The recording content is rich, covering economic, entertainment, news, spoken language, etc. All texts are manually transcrits, with high accuracy. It ...
Spanish Speaking English Speech Data by Mobile Phone - 388 Hours
891 Spanish native speakers participated in the recording with authentic accent. The recorded script is designed by linguists and cover a wide range of topics including generic, interactive, on-board and home. The text is manually proofread with high accuracy. It matches with mainstream Android and Apple system phones. The data ...
Changsha Dialect Speech Data by Mobile Phone - 997 Hours
2,000 Changsha natives participated in the recording, covering multiple age groups, with a balanced gender distribution and authentic accent. The recorded text is rich in content, covering general, interactive, car, home and other categories. Local people in changsha check and proofread. The accuracy of sentences is 95%. It is mainly ...
Indian English Speech Data by Mobile Phone - 1,012 Hours
Indian English audio data captured by mobile phones, 1,012 hours in total, recorded by 2,100 Indian native speakers. The recorded text is designed by linguistic experts, covering generic, interactive, on-board, home and other categories. The text has been proofread manually with high accuracy; this data set can be used for ...
Italian Speech Data by Mobile Phone - 1,441 Hours
The data were recorded by 3,109 native Italian speakers with authentic Italian accents. The recorded content covers a wide range of categories such as general purpose, interactive, in car commands, home commands, etc. The recorded text is designed by a language expert, and the text is manually proofread with high ...
Spanish Speech Data by Mobile Phone_R - 227 Hours
The data volumn is 227 hours. It is recorded by Spanish native speakers from Spain, Mexico and Venezuela. It is recorded in quiet environment. The recording contents cover various fields like economy, entertainment, news and spoken language. All texts are manually transcribed. The sentence accuracy is 95%. Format:16kHz, 16bit, uncompressed ...
Uyghur Speech Data by Mobile Phone - 738 Hours
It collects 2,058 people from the Uighur community, with a balanced ratio of men and women. The recording contents are 300,000 Uighur spoken sentences, and the recording environment is quiet indoor. All sentences were manually and accurately transcribed and annotated with noise signs. Format:16kHz, 16bit, uncompressed wav, mono channel Recording ...
French Speech Data by Mobile Phone - 769 Hours
The data volumn is 769 hours and is recorded by 1623 French native speakers. The recording text is designed by linguistic experts, which covers general interactive, in-car and home category. The texts are manually proofread with high accuracy. Recording devices are mainstream Android phones and iPhones. Format:16kHz, 16bit, uncompressed wav, ...
British English Speech Data by Mobile Phone_Reading - 199 Hours
The data set contains 346 British English speakers' speech data, all of whom are English locals. Around 392 sentences of each speaker. The valid data is 199 hours. Recording environment is quiet. Recording contents contain various categories like economics, news, entertainment, commonly used spoken language, letter, figure, etc. Format:16kHz, 16bit, ...
Mandarin Speech Data by Mobile Phone - 2,028 Hours
4,787 Chinese native speakers participated in the recording with equal gender. Speakers are from various provinces of China. The recording content is rich, covering mobile phone voice assistant interaction, smart home command and control, In-car command and control, numbers, and other fields, which is accurately matching the smart home, intelligent ...
Spanish Speech Data by Mobile Phone - 435 Hours
The data volumn is 435 hours and is recorded by 989 Spanish native speakers. The recording text is designed by linguistic experts, which covers general interactive, in-car and home category. The texts are manually proofread with high accuracy. Recording devices are mainstream Android phones and iPhones. Format:16kHz, 16bit, uncompressed wav, ...
German Speech Data by Mobile Phone - 1,796 Hours
German audio data captured by mobile phone, consisting of 1,796 hours in total, recorded by 3,442 German native speakers. The recorded text is designed by linguistic experts, covering generic, interactive, on-board, home and other categories. The text has been proofread manually with high accuracy; this data can be used for ...
Mandarin Conversational Speech Data by Mobile Phone and Voice Recorder - 1,351 Hours
1950 speakers participated in the recording, and conducted face-to-face communication in a natural way. They had free discussion on a number of given topics, with a wide range of fields. The voice was natural and fluent, in line with the actual dialogue scene. Text is transcribed manually, with high accuracy. ...
American English Speech Data by Mobile Phone - 800 Hours
1842 American native speakers participated in the recording with authentic accent. The recorded script is designed by linguists, based on scenes, and cover a wide range of topics including generic, interactive, on-board and home. The text is manually proofread with high accuracy. It matches with mainstream Android and Apple system ...
Spanish Speech Data by Mobile Phone - 338 Hours
The 338-hour Spanish speech data and is recorded by 800 Spanish-speaking native speakers from Spain, Mexico, Argentina. The recording enviroment is quiet. All texts are manually transcribed.The sentence accuracy rate is 95%. It can be applied to speech recognition, machine translation, voiceprint recognition and so on. Format:16kHz, 16bit, uncompressed wav, ...
Malay Speech Data by Mobile Phone_Reading - 134 Hours
156 Speakers - Mobile Telephony Malay Speech Data_Reading is recorded by native Malay speakers in the quiet environment. The recording is rich in content, covering multiple categories such as economy, entertainment, news, oral language, numbers, and letters. Around 450 sentences for each speaker. The effective time is 134 hours. All ...
Italian Speech Data by Mobile Phone_Reading - 215 Hours
Italian speech data (reading) is collected from 325 Italian native speakers and is recorded in quiet environment. The recording is rich in content, covering multiple categories such as econimics, entertainment, news, and oral. Each sentence contains 9.2 words in average. Each sentence is repeated 2.7 times on average. All texts ...
Wake-up Words Speech Data by Microphone - 1,027 People
More than 1,000 recorders read the specified wake-up words, covering slow, normal, and fast three speeds. Audios are recorded in the professional recording studio using the microphone. Format:48kHz, 16bit, uncompressed wav, mono channel Recording environment:professional recording studio Recording content(read speech):wake-up words Speaker:1,027 Chinese, 52% of which are female Device:microphone Language:Mandarin, ...
Chinese Children Speaking English Speech Data by Mobile Phone - 464 Hours
Children read English audio data, covering ages from preschool (3-5 years old) to post-school (6-12 years old) , with children's speech features. Content accurately matches children's actual scenes of speaking English. It provides data support for children's smart home, automatic speech recognition and oral assessment in intelligent education scene. Format:16kHz,16bit, ...
Kunming Dialect Speech Data by Mobile Phone - 1,002 Hours
2,284 native speakers of Kunming dialect participated in the recording, with authentic accent and from multiple age groups. The recorded script covers a wide range of topics such as generic, interactive, on-board, and home. Local people in Kunming participated in quality check and proofreading, and the text was transcrit accurately. ...
Non-Hispanic Spanish Speech Data by Mobile Phone - 762 Hours
1,630 non-Spanish nationality native Spanish speakers such as Mexicans and Colombians participated in the recording with authentic accent. The recorded script is designed by linguists and cover a wide range of topics including generic, interactive, in-vehicle and home. The text is manually proofread with high accuracy. It matches with mainstream ...
French Speaking English Speech Data by Mobile Phone - 520 Hours
1089 French native speakers participated in the recording with authentic accent. The recorded script is designed by linguists and cover a wide range of topics including generic, interactive, on-board and home. The text is manually proofread with high accuracy. It matches with mainstream Android and Apple system phones. The data ...