Search and Browse – ELRA Catalogue

American Children Speech Data by Microphone - 50 Hours audio

English

ID: ELRA-S0468

It is recorded by 219 American children native speakers. The recording texts are mainly storybook, children's song, spoken expressions, etc. 350 sentences for each speaker. Each sentence contain 4.5 words in average. Each sentence is repeated 2.1 times in average. The recording device is hi-fi Bl...

MEMBER	academic	commercial
Licence: Commercial Use - ELRA VAR	28785.00 €	28785.00 €

NON MEMBER	academic	commercial
Licence: Commercial Use - ELRA VAR	28785.00 €	28785.00 €

Special offers are also available. Check here for details.

American English Conversational Speech Recognition Corpus (Multi-Channel) audio

English

ID: ELRA-S0228-93

ISLRN: 576-996-121-023-5

This corpus was recorded by 20 speakers (10 males and 10 females), over 7 channels (multi-channel in quiet office/home). Speech samples are stored as a sequence of 16-bit 16 kHz for a total of 10 hours of speech per channel.

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	5600.00 €	5600.00 €
Licence: Commercial Use - ELRA VAR	5600.00 €	5600.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	5600.00 €	5600.00 €
Licence: Commercial Use - ELRA VAR	5600.00 €	5600.00 €

American English Speech Data by Mobile Phone - 800 Hours audio

English

ID: ELRA-S0437

ISLRN: 629-877-109-625-1

1842 American native speakers participated in the recording with authentic accent. The recorded script is designed by linguists, based on scenes, and cover a wide range of topics including generic, interactive, on-board and home. The text is manually proofread with high accuracy. It matches with ...

MEMBER	academic	commercial
Licence: Commercial Use - ELRA VAR	136800.00 €	136800.00 €

NON MEMBER	academic	commercial
Licence: Commercial Use - ELRA VAR	136800.00 €	136800.00 €

Special offers are also available. Check here for details.

American English Speech Data by Mobile Phone_Reading - 215 Hours audio

English

ID: ELRA-S0467

ISLRN: 921-365-371-849-5

The data set contains 349 American English speakers' speech data, all of whom are American locals. It is recorded in quiet environment. The recording contents cover various categories like economics, entertainment, news and spoken language. It is manually transcribed and annotated with the starti...

MEMBER	academic	commercial
Licence: Commercial Use - ELRA VAR	34722.50 €	34722.50 €

NON MEMBER	academic	commercial
Licence: Commercial Use - ELRA VAR	34722.50 €	34722.50 €

Special offers are also available. Check here for details.

American English Speech Recognition Corpus (Desktop) audio

English

ID: ELRA-S0228-79

ISLRN: 254-019-000-249-3

This corpus comprises 49,990 entries uttered by 50 speakers (25 males and 25 females), recorded over 2 channels (desktop in quiet office). Speech samples are stored as a sequence of 16-bit 16kHz for a total of 24.9 hours of speech per channel.

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	6000.00 €	6000.00 €
Licence: Commercial Use - ELRA VAR	6000.00 €	6000.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	6000.00 €	6000.00 €
Licence: Commercial Use - ELRA VAR	6000.00 €	6000.00 €

American English Speech Recognition Corpus (Mobile) - 14.67 hours audio

English

ID: ELRA-S0228-73

ISLRN: 817-988-141-738-4

This corpus comprises 14,988 entries uttered by 50 speakers (23 males and 27 females), recorded over the mobile telephone network. Speech samples are stored as a sequence of 16-bit 16 kHz for a total of 14.67 hours of speech.

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	6000.00 €	6000.00 €
Licence: Commercial Use - ELRA VAR	6000.00 €	6000.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	6000.00 €	6000.00 €
Licence: Commercial Use - ELRA VAR	6000.00 €	6000.00 €

American English Speech Recognition Corpus (Mobile) - 19.4 hours audio

English

ID: ELRA-S0228-58

ISLRN: 968-856-860-742-9

This corpus comprises 39,243 entries uttered by 151 speakers (74 males and 77 females), recorded over the mobile telephone network. Speech samples are stored as a sequence of 16-bit 16kHz for a total of 19.4 hours of speech.

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	2700.00 €	2700.00 €
Licence: Commercial Use - ELRA VAR	2700.00 €	2700.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	2700.00 €	2700.00 €
Licence: Commercial Use - ELRA VAR	2700.00 €	2700.00 €

American Spanish Recognition Corpus (Desktop+Mobile) audio

English

ID: ELRA-S0228-68

ISLRN: 100-009-143-020-4

This corpus comprises 33,527 entries uttered by 40 speakers (21 males and 19 females), recorded over 2 channels (desktop in quiet office and mobile in noisy restaurant). Speech samples are stored as a sequence of 16-bit 16kHz for a total of 14.7 hours of speech per channel.

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	4800.00 €	4800.00 €
Licence: Commercial Use - ELRA VAR	4800.00 €	4800.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	4800.00 €	4800.00 €
Licence: Commercial Use - ELRA VAR	4800.00 €	4800.00 €

Amharic-English bilingual corpus text

Amharic
English

ID: ELRA-W0074

ISLRN: 590-255-335-719-0

The Amharic-English bilingual corpus contains parallel text from legal and news domains in Amharic script, in transliterated form and in English. The size of the corpus is of 232,653 words in Amharic and 291,701 in English. This parallel corpus contains documents from two domains, namely legal...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	2000.00 €
Licence: Commercial Use - ELRA VAR	2000.00 €	2000.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	4000.00 €
Licence: Commercial Use - ELRA VAR	4000.00 €	4000.00 €

AnCora Catalan 2.0.0 text

Catalan; Valencian

ID: ELRA-W0327

ISLRN: 186-654-762-852-8

The AnCora Catalan Corpus 2.0.0 is a corpus of 500,000 words annotated at different levels: - Lemma and Part of Speech, - Syntactic constituents and functions, - Argument structure and thematic roles, - Semantic classes of the verb, - Denotative type of deverbal nouns, - Nouns related to W...

MEMBER	academic	commercial
Licence: Attribution, Commercial Use - GPL	0.00 €	0.00 €

NON MEMBER	academic	commercial
Licence: Attribution, Commercial Use - GPL	0.00 €	0.00 €

AnCora Spanish 2.0.0 text

Spanish; Castilian

ID: ELRA-W0326

ISLRN: 252-495-813-736-1

The AnCora Spanish Corpus 2.0.0 is a corpus of 500,000 words annotated at different levels: - Lemma and Part of Speech, - Syntactic constituents and functions, - Argument structure and thematic roles, - Semantic classes of the verb, - Denotative type of deverbal nouns, - Nouns related to W...

MEMBER	academic	commercial
Licence: Attribution, Commercial Use - GPL	0.00 €	0.00 €

NON MEMBER	academic	commercial
Licence: Attribution, Commercial Use - GPL	0.00 €	0.00 €

ANITA (Audio eNhancement In Telecom Applications) audio

English
French
German
Spanish; Castilian

ID: ELRA-S0156

ISLRN: 537-894-870-719-4

ANITA (Audio eNhancement In secured Telecommunication Applications) is a European project launched on the initiative of EADS TELECOM with the objective of reducing audio acoustics noise in secured communications in adverse environments (sirens, alarms, engines, water pumps, stress situations, etc...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	1000.00 €	2000.00 €
Licence: Commercial Use - ELRA VAR	2000.00 €	2000.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	1500.00 €	2500.00 €
Licence: Commercial Use - ELRA VAR	2500.00 €	2500.00 €

An-Nahar Newspaper Text Corpus text

Arabic

ID: ELRA-W0027

ISLRN: 083-457-618-309-8

The An-Nahar Lebanon Newspaper Text Corpus comprises articles in standard Arabic from 1995 to 2000 (6 years) stored as HTML files on CDRom media. Each year contains 45 000 articles and 24 million words. Each article includes information such as title, newspaper's name, date, country, type, page, ...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	2016.00 €	3192.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	3024.00 €	4788.00 €

Special offers are also available. Check here for details.

Annotated tweet corpus in Arabizi, French and English text

Arabic
English
French

ID: ELRA-W0323

ISLRN: 482-848-308-105-6

The annotated tweet corpus in Arabizi, French and English was built by ELDA on behalf of INSA Rouen Normandie (Normandie Université, LITIS team), in the framework of the SAPhIRS project (System for the Analysis of Information Propagation in Social Networks), funded by the DGE (Direction Générale ...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	7000.00 €
Licence: Commercial Use - ELRA VAR	7000.00 €	7000.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	10000.00 €
Licence: Commercial Use - ELRA VAR	10000.00 €	10000.00 €

APASCI

Italian

ID: ELRA-S0039

ISLRN: 501-292-014-931-9

APASCI is an Italian speech database recorded in an insulated room with a Sennheiser MKH 416 T microphone. It includes 5,290 phonetically rich sentences and 10,800 isolated digits, for a total of 58,924 word occurrences (2,191 different words) and 641 minutes of speech. The speech material was re...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	800.00 €	20000.00 €
Licence: Commercial Use - ELRA VAR	20000.00 €	20000.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	1600.00 €	25000.00 €
Licence: Commercial Use - ELRA VAR	25000.00 €	25000.00 €

Arabic dictionary of inflected words text

Arabic

ID: ELRA-L0098

ISLRN: 049-623-948-389-2

The Arabic dictionary of inflected words consists of a list of 6 million inflected forms, fully vowelized, generated in compliance with the grammatical rules of Arabic and tagged with grammatical information which includes POS and grammatical features, including number, gender, case, definiteness...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	3000.00 €	10000.00 €
Licence: Commercial Use - ELRA VAR	10000.00 €	10000.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	4500.00 €	15000.00 €
Licence: Commercial Use - ELRA VAR	15000.00 €	15000.00 €

Arabic dictionary of inflected words with recognition of agglutinated clitics and inflection system text

Arabic

ID: ELRA-L0099

ISLRN: 963-860-792-289-9

This dictionary consists of 6 million inflected forms, fully vowelized, generated in compliance with the grammatical rules of Arabic and tagged with grammatical information which includes POS and grammatical features, including number, gender, case, definiteness, tense, mood and compatibility wit...

MEMBER	academic	commercial
Licence: Commercial Use - ELRA VAR	25000.00 €	25000.00 €

NON MEMBER	academic	commercial
Licence: Commercial Use - ELRA VAR	37000.00 €	37000.00 €

Arabic Morphological Dictionary text

Arabic

ID: ELRA-L0088

ISLRN: 472-591-121-577-5

The Arabic Morphological Dictionary contains 4,912,749 entries, including: - 3,374,852 nouns, - 1,537,699 verbs, - 198 grammatical words. The dictionary is stored on 1 CD. All files are provided as plain text in UTF8 character encoding, which represents about 154 Mb of data. The dictionary form...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	250.00 €	6000.00 €
Licence: Commercial Use - ELRA VAR	6000.00 €	6000.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	450.00 €	12000.00 €
Licence: Commercial Use - ELRA VAR	12000.00 €	12000.00 €

Arabic Speech Corpus audio

Arabic

ID: ELRA-S0384

ISLRN: 866-568-447-697-8

This speech corpus has been developed as part of a PhD work carried out by Nawar Halabi at the University of Southampton. The corpus was recorded through a Neumann TLM 103 Studio Microphone by one male speaker in South Levantine Arabic (Damascian accent) in a professional studio. The transcript w...

MEMBER	academic	commercial
Licence: Commercial Use - ELRA VAR		9000.00 €
Licence: Attribution - CC-BY	0.00 €	0.00 €

NON MEMBER	academic	commercial
Licence: Commercial Use - ELRA VAR		11200.00 €
Licence: Attribution - CC-BY	0.00 €	0.00 €

Corpus:
Lexical/Conceptual:
Tool/Service:
Language Description:

Text:
Audio:
Image:
Video:
Text Numerical:
Text N-Gram:

Resource Type:

Media Type:

1649 Language Resources (Page 2 of 83)