Search and Browse – ELRA Catalogue

Italian Kids Speech Recognition Corpus (Desktop) audio

Italian

ID: ELRA-S0228-98

This corpus comprises 19,788 entries uttered by 31 speakers (15 males and 16 females), recorded over 2 channels (desktop in quiet office). Speech samples are stored as a sequence of 16-bit 44.1kHz for a total of 4.9 hours of speech per channel.

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	6000.00 €	6000.00 €
Licence: Commercial Use - ELRA VAR	6000.00 €	6000.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	6000.00 €	6000.00 €
Licence: Commercial Use - ELRA VAR	6000.00 €	6000.00 €

Italian Speech Corpus 1 (Appen) audio

Italian

ID: ELRA-S0147

ISLRN: 458-657-455-735-5

The Italian Speech Corpus 1 contains the recordings of 202 native Italian speakers (112 males, 90 females) recorded in an office and a closed public place, over 4 channels, in a range of low to medium background noise environments (Plantronics Audio 10 (computer/desk mic), Shure SM58 (desk mounte...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	1200.00 €	9500.00 €
Licence: Commercial Use - ELRA VAR	9500.00 €	9500.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	1500.00 €	15000.00 €
Licence: Commercial Use - ELRA VAR	15000.00 €	15000.00 €

Italian Speech Data by Mobile Phone - 1,441 Hours audio

Italian

ID: ELRA-S0450

ISLRN: 217-750-727-467-7

The data were recorded by 3,109 native Italian speakers with authentic Italian accents. The recorded content covers a wide range of categories such as general purpose, interactive, in car commands, home commands, etc. The recorded text is designed by a language expert, and the text is manually pr...

MEMBER	academic	commercial
Licence: Commercial Use - ELRA VAR	342237.50 €	342237.50 €

NON MEMBER	academic	commercial
Licence: Commercial Use - ELRA VAR	342237.50 €	342237.50 €

Special offers are also available. Check here for details.

Italian Speech Data by Mobile Phone_Reading - 215 Hours audio

Italian

ID: ELRA-S0472

ISLRN: 341-812-724-006-1

Italian speech data (reading) is collected from 325 Italian native speakers and is recorded in quiet environment. The recording is rich in content, covering multiple categories such as econimics, entertainment, news, and oral. Each sentence contains 9.2 words in average. Each sentence is repeated...

MEMBER	academic	commercial
Licence: Commercial Use - ELRA VAR	38807.50 €	38807.50 €

NON MEMBER	academic	commercial
Licence: Commercial Use - ELRA VAR	38807.50 €	38807.50 €

Special offers are also available. Check here for details.

Italian Speech Data Collected by Mobile Phone - 347 Hours audio

Italian

ID: ELRA-S0461

ISLRN: 382-599-484-763-7

Italian languageaudio data captured by mobile phone , with total duration of 347 hours. It is recorded by 800 Italian native speakers, balanced in gender is balanced; the recording environment is quiet; all texts are manually transcribed with high accuracy. This data set can be applied on automat...

MEMBER	academic	commercial
Licence: Commercial Use - ELRA VAR	62633.50 €	62633.50 €

NON MEMBER	academic	commercial
Licence: Commercial Use - ELRA VAR	62633.50 €	62633.50 €

Special offers are also available. Check here for details.

Italian SpeechDat-Car database audio

Italian

ID: ELRA-S0144

ISLRN: 513-325-829-468-0

The Italian SpeechDat-Car database contains the recordings of 300 Italian speakers (149 females, 151 males) recorded over the GSM telephone network, in a car. This database is partitioned into 14 DVDs. The speech data files are in two formats. Four of the 5 microphones were recorded on the comput...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	90000.00 €	90000.00 €
Licence: Commercial Use - ELRA VAR	90000.00 €	90000.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	120000.00 €	120000.00 €
Licence: Commercial Use - ELRA VAR	120000.00 €	120000.00 €

Italian Speech Recognition Corpus (Desktop) audio

Italian

ID: ELRA-S0228-80

ISLRN: 789-295-563-911-2

This corpus comprises 49,994 entries uttered by 50 speakers (23 males and 27 females), recorded over 2 channels (desktop in quiet office). Speech samples are stored as a sequence of 16-bit 48kHz for a total of 24.21hours of speech per channel.

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	6000.00 €	6000.00 €
Licence: Commercial Use - ELRA VAR	6000.00 €	6000.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	6000.00 €	6000.00 €
Licence: Commercial Use - ELRA VAR	6000.00 €	6000.00 €

Italian Speecon database audio

Italian

ID: ELRA-S0213

ISLRN: 239-555-046-548-2

The Italian Speecon database is divided into 2 sets: 1) The first set comprises the recordings of 550 adult Italian speakers (273 males, 277 females), recorded over 4 microphone channels in 4 recording environments (office, entertainment, car, public place). 2) The second set comprises the reco...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	50000.00 €	67000.00 €
Licence: Commercial Use - ELRA VAR	67000.00 €	67000.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	60000.00 €	75000.00 €
Licence: Commercial Use - ELRA VAR	75000.00 €	75000.00 €

Italian Syntactic-Semantic Treebank (ISST) text

Italian

ID: ELRA-W0044

ISLRN: 927-246-660-947-9

ISST comprises 89,941 tokens for the financial-domain part and 215,606 tokens for the general part. It is formatted in XML. ISST has a five-level structure covering orthographic, morpho-syntactic, syntactic and semantic levels of linguistic description. Syntactic annotation is distributed over t...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	100.00 €	1500.00 €
Licence: Commercial Use - ELRA VAR	1500.00 €	1500.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	150.00 €	2500.00 €
Licence: Commercial Use - ELRA VAR	2500.00 €	2500.00 €

Italian TTS Speech Corpus (Appen) audio

Italian

ID: ELRA-S0148

ISLRN: 976-246-706-503-6

The Italian TTS Speech Corpus contains the recordings of 1 native Italian speaker (male, 50 years old) recorded in a studio over 1 channel (Shure SM15 unidirectional professional head-word condenser microphone). The data collection and transcription were performed by Appen (Australia). Speech sam...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	2000.00 €	9000.00 €
Licence: Commercial Use - ELRA VAR	9000.00 €	9000.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	3500.00 €	11000.00 €
Licence: Commercial Use - ELRA VAR	11000.00 €	11000.00 €

Letter of rights for persons arrested on the basis of a European Arrest Warrant (Processed) text

Bulgarian
Dutch; Flemish
English
French
German
Italian
Latvian
Modern Greek (1453-)
Polish
Romanian; Moldavian; Moldovan

ID: ELRA-W0301

ISLRN: 175-028-844-014-3

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Letter of rights for persons arrested on the basis of a ...

MEMBER	academic	commercial
Licence: Attribution - CC-BY-4.0	0.00 €	0.00 €

NON MEMBER	academic	commercial
Licence: Attribution - CC-BY-4.0	0.00 €	0.00 €

MLCC Multilingual and Parallel Corpora text

Danish
Dutch; Flemish
English
French
German
Italian
Modern Greek (1453-)
Portuguese
Spanish; Castilian

ID: ELRA-W0023

ISLRN: 963-635-729-341-8

The MLCC text corpus has two main components - one set to allow comparable studies to be carried out in different languages and one set as the basis for translation studies. The first set is referred as the Polylingual Document Collection, a collection of newspaper articles from financial new...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	1600.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	3600.00 €

MULTEXT JOC Corpus text

English
French
German
Italian
Spanish; Castilian

ID: ELRA-W0017

ISLRN: 900-482-746-635-0

This CD-ROM contains a part of the corpus developed in the MULTEXT project financed by the European Commission (LRE 62-050). This part contains raw, tagged and aligned data from the Written Questions and Answers of the Official Journal of the European Community. The corpus contains approx. 5 mill...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	2000.00 €
Licence: Commercial Use - ELRA VAR	2000.00 €	2000.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	5000.00 €
Licence: Commercial Use - ELRA VAR	5000.00 €	5000.00 €

MULTEXT Prosodic database audio

English
French
German
Italian
Spanish; Castilian

ID: ELRA-S0060

ISLRN: 098-719-242-965-4

This database comprises one CD-ROM for each five languages (French, English, Italian, German and Spanish), totalling 4 hours and 20 minutes of speech and involving 50 different speakers (5 male and 5 female per language). The recordings on which the corpus is based consist of passages of about f...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	45.00 €	2000.00 €
Licence: Commercial Use - ELRA VAR	2000.00 €	2000.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	100.00 €	5000.00 €
Licence: Commercial Use - ELRA VAR	5000.00 €	5000.00 €

PANACEA Environment Italian monolingual corpus text

Italian

ID: ELRA-W0069

ISLRN: 843-358-936-298-5

The PANACEA Environment Italian monolingual corpus was acquired in the framework of the PANACEA project (Platform for Automatic, Normalized Annotation and Cost-Effective Acquisition of Language Resources for Human Language Technologies), under the European Commission's Seventh Framework Programme...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	0.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	0.00 €

PANACEA Labour Italian monolingual corpus text

Italian

ID: ELRA-W0070

ISLRN: 393-864-255-110-7

The PANACEA Labour Italian monolingual corpus was acquired in the framework of the PANACEA project (Platform for Automatic, Normalized Annotation and Cost-Effective Acquisition of Language Resources for Human Language Technologies), under the European Commission's Seventh Framework Programme. ...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	0.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	0.00 €

Parallel Corpora & Domains (bilingual and multilingual) text

Arabic
Chinese
Danish
Dutch; Flemish
English
Finnish
French
German
Hebrew
Italian
Japanese
Korean
Modern Greek (1453-)
Northern Sami
Norwegian
Polish
Portuguese
Russian
Spanish; Castilian
Swedish
Turkish

ID: ELRA-W0336

ISLRN: 471-919-856-164-1

Parallel corpora for nearly 400 language pairs and numerous multilingual combinations, including 10 million bilingual segments and 90 million tokens in 20 languages: Arabic, Chinese (Simplified), Danish, Dutch, English, Finnish, French, German, Greek, Hebrew, Italian, Japanese, Korean, North Sami...

MEMBER	academic	commercial
Licence: Commercial Use - ELRA VAR	0.10 €	0.10 €

NON MEMBER	academic	commercial
Licence: Commercial Use - ELRA VAR	0.11 €	0.11 €

Special offers are also available. Check here for details.

Parallel texts from Swedish Social Security Authority (Processed) text

Croatian
English
Finnish
French
German
Italian
Polish
Romanian; Moldavian; Moldovan
Spanish; Castilian
Swedish

ID: ELRA-W0303

ISLRN: 002-471-002-734-6

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Parallel texts, email templates and forms in pdf file fo...

MEMBER	academic	commercial
Licence: Other - Public Domain	0.00 €	0.00 €

NON MEMBER	academic	commercial
Licence: Other - Public Domain	0.00 €	0.00 €

Parallel texts from Swedish Work environment Authority (Processed) text

Bulgarian
Czech
English
Estonian
Finnish
French
German
Hungarian
Italian
Latvian
Lithuanian
Modern Greek (1453-)
Polish
Romanian; Moldavian; Moldovan
Spanish; Castilian
Swedish

ID: ELRA-W0304

ISLRN: 448-438-055-941-1

This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Europe Facility - Automated Translation (CEF.AT) action. For further information on the project: http://lr-coordination.eu. Parallel texts from the Swedish Work Environment authori...

MEMBER	academic	commercial
Licence: Other - Public Domain	0.00 €	0.00 €

NON MEMBER	academic	commercial
Licence: Other - Public Domain	0.00 €	0.00 €

PAROLE Italian Corpus text

Italian

ID: ELRA-W0043

ISLRN: 608-362-291-385-1

The PAROLE Italian Corpus comprises 3,135,651 words collected from four different domains: • newspapers: 2,179,800 words from La Stampa, La Repubblica, Il Corriere della Sera, L’Unione Sarda, Il Sole 24ore, between 1992 and 1996, • periodicals: 143,810 words from Casaviva, 100cose, Epoca, Espan...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	100.00 €	100.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	150.00 €	150.00 €

Corpus:
Lexical/Conceptual:
Tool/Service:
Language Description:

Text:
Audio:
Image:
Video:
Text Numerical:
Text N-Gram:

Resource Type:

Media Type:

46 Language Resources (Page 2 of 3)