1,575 language resources at your disposal
An increasing number of LRs in the various fields of Human Language Technology (see image on the left-hand side) are distributed on behalf of ELRA via its operational body ELDA, thanks to the contribution of various players of the HLT community.
Our aim is to provide Language Resources, by means of this repository, so as to prevent researchers and developers from investing efforts to rebuild resources which already exist as well as help them identify and access those resources.
Latest Resources
Corpus of Spontaneous Japanese (CSJ)
The "Corpus of Spontaneous Japanese" (or CSJ) is a database containing a large collection of Japanese spoken language data and information for use in linguistic research; jointly developed by NINJAL, NICT and the Tokyo Institute of Technology, the CSJ is world-class in both the quantity and quality of the available ...
Bitext Synonym Data - General Language
The Bitext Synonym Data - General Language includes 31,723 entries and more than 100,000 synonyms for English language. This dataset is a set of synonyms developed to augment the English version of Wordnet, a powerful open-source lexical database, released in 2005. All synonyms can be linked to Bitext Lexical Data ...
Bitext Synthetic Data - Automotive (English language)
The Bitext Synthetic Data consist of pre-built training data for intent detection and are provided for 20 verticals for English language (see ELRA-L0162 to ELRA-L0181). They cover the most common intents for each vertical and include a large number of example utterances for each intent, with optional entity/slot annotations for ...
Bitext Synthetic Data - Manufacturing (Spanish language)
The Bitext Synthetic Data consist of pre-built training data for intent detection and are provided for 20 verticals for Spanish language (see ELRA-L0182 to ELRA-L0201). They cover the most common intents for each vertical and include a large number of example utterances for each intent, with optional entity/slot annotations for ...
Bitext Synthetic Data - Real estate and construction (English language)
The Bitext Synthetic Data consist of pre-built training data for intent detection and are provided for 20 verticals for English language (see ELRA-L0162 to ELRA-L0181). They cover the most common intents for each vertical and include a large number of example utterances for each intent, with optional entity/slot annotations for ...
Bitext Synthetic Data - Restaurant/ bar chains (English language)
The Bitext Synthetic Data consist of pre-built training data for intent detection and are provided for 20 verticals for English language (see ELRA-L0162 to ELRA-L0181). They cover the most common intents for each vertical and include a large number of example utterances for each intent, with optional entity/slot annotations for ...
Bitext Synthetic Data - Wealth management (Spanish language)
The Bitext Synthetic Data consist of pre-built training data for intent detection and are provided for 20 verticals for Spanish language (see ELRA-L0182 to ELRA-L0201). They cover the most common intents for each vertical and include a large number of example utterances for each intent, with optional entity/slot annotations for ...
Bitext Synthetic Data - Telecommunication (English language)
The Bitext Synthetic Data consist of pre-built training data for intent detection and are provided for 20 verticals for English language (see ELRA-L0162 to ELRA-L0181). They cover the most common intents for each vertical and include a large number of example utterances for each intent, with optional entity/slot annotations for ...
Bitext Synthetic Data - Education (English language)
The Bitext Synthetic Data consist of pre-built training data for intent detection and are provided for 20 verticals for English language (see ELRA-L0162 to ELRA-L0181). They cover the most common intents for each vertical and include a large number of example utterances for each intent, with optional entity/slot annotations for ...
Bitext Synthetic Data - Insurance (English language)
The Bitext Synthetic Data consist of pre-built training data for intent detection and are provided for 20 verticals for English language (see ELRA-L0162 to ELRA-L0181). They cover the most common intents for each vertical and include a large number of example utterances for each intent, with optional entity/slot annotations for ...
Bitext Synthetic Data - Manufacturing (English language)
The Bitext Synthetic Data consist of pre-built training data for intent detection and are provided for 20 verticals for English language (see ELRA-L0162 to ELRA-L0181). They cover the most common intents for each vertical and include a large number of example utterances for each intent, with optional entity/slot annotations for ...
Bitext Synthetic Data - Retail banking (English language)
The Bitext Synthetic Data consist of pre-built training data for intent detection and are provided for 20 verticals for English language (see ELRA-L0162 to ELRA-L0181). They cover the most common intents for each vertical and include a large number of example utterances for each intent, with optional entity/slot annotations for ...
Bitext Synthetic Data - Travel (English language)
The Bitext Synthetic Data consist of pre-built training data for intent detection and are provided for 20 verticals for English language (see ELRA-L0162 to ELRA-L0181). They cover the most common intents for each vertical and include a large number of example utterances for each intent, with optional entity/slot annotations for ...
Bitext Synthetic Data - Hospitality (Spanish language)
The Bitext Synthetic Data consist of pre-built training data for intent detection and are provided for 20 verticals for Spanish language (see ELRA-L0182 to ELRA-L0201). They cover the most common intents for each vertical and include a large number of example utterances for each intent, with optional entity/slot annotations for ...
Bitext Synthetic Data - Field Service (Spanish language)
The Bitext Synthetic Data consist of pre-built training data for intent detection and are provided for 20 verticals for Spanish language (see ELRA-L0162 to ELRA-L0181). They cover the most common intents for each vertical and include a large number of example utterances for each intent, with optional entity/slot annotations for ...
Bitext Synthetic Data - Legal (Spanish language)
The Bitext Synthetic Data consist of pre-built training data for intent detection and are provided for 20 verticals for Spanish language (see ELRA-L0182 to ELRA-L0201). They cover the most common intents for each vertical and include a large number of example utterances for each intent, with optional entity/slot annotations for ...
Bitext Synthetic Data - Restaurant/ bar chains (Spanish language)
The Bitext Synthetic Data consist of pre-built training data for intent detection and are provided for 20 verticals for Spanish language (see ELRA-L0182 to ELRA-L0201). They cover the most common intents for each vertical and include a large number of example utterances for each intent, with optional entity/slot annotations for ...
Bitext Synthetic Data - Legal (English language)
The Bitext Synthetic Data consist of pre-built training data for intent detection and are provided for 20 verticals for English language (see ELRA-L0162 to ELRA-L0181). They cover the most common intents for each vertical and include a large number of example utterances for each intent, with optional entity/slot annotations for ...
Bitext Synthetic Data - Retail Ecomm (English language)
The Bitext Synthetic Data consist of pre-built training data for intent detection and are provided for 20 verticals for English language (see ELRA-L0162 to ELRA-L0181). They cover the most common intents for each vertical and include a large number of example utterances for each intent, with optional entity/slot annotations for ...
Bitext Synthetic Data - Education (Spanish language)
The Bitext Synthetic Data consist of pre-built training data for intent detection and are provided for 20 verticals for Spanish language (see ELRA-L0182 to ELRA-L0201). They cover the most common intents for each vertical and include a large number of example utterances for each intent, with optional entity/slot annotations for ...
Bitext Synthetic Data - Moving and storage (English language)
The Bitext Synthetic Data consist of pre-built training data for intent detection and are provided for 20 verticals for English language (see ELRA-L0162 to ELRA-L0181). They cover the most common intents for each vertical and include a large number of example utterances for each intent, with optional entity/slot annotations for ...
Bitext Synthetic Data - Real estate and construction (Spanish language)
The Bitext Synthetic Data consist of pre-built training data for intent detection and are provided for 20 verticals for Spanish language (see ELRA-L0182 to ELRA-L0201). They cover the most common intents for each vertical and include a large number of example utterances for each intent, with optional entity/slot annotations for ...
Bitext Synthetic Data - Retail banking (Spanish language)
The Bitext Synthetic Data consist of pre-built training data for intent detection and are provided for 20 verticals for Spanish language (see ELRA-L0182 to ELRA-L0201). They cover the most common intents for each vertical and include a large number of example utterances for each intent, with optional entity/slot annotations for ...
Bitext Synthetic Data - Utilities (English language)
The Bitext Synthetic Data consist of pre-built training data for intent detection and are provided for 20 verticals for English language (see ELRA-L0162 to ELRA-L0181). They cover the most common intents for each vertical and include a large number of example utterances for each intent, with optional entity/slot annotations for ...
Bitext Synthetic Data - Event and ticketing (English language)
The Bitext Synthetic Data consist of pre-built training data for intent detection and are provided for 20 verticals for English language (see ELRA-L0162 to ELRA-L0181). They cover the most common intents for each vertical and include a large number of example utterances for each intent, with optional entity/slot annotations for ...
Bitext Synthetic Data - Utilities (Spanish language)
The Bitext Synthetic Data consist of pre-built training data for intent detection and are provided for 20 verticals for Spanish language (see ELRA-L0182 to ELRA-L0201). They cover the most common intents for each vertical and include a large number of example utterances for each intent, with optional entity/slot annotations for ...
Bitext Synthetic Data - Event and ticketing (Spanish language)
The Bitext Synthetic Data consist of pre-built training data for intent detection and are provided for 20 verticals for Spanish language (see ELRA-L0182 to ELRA-L0201). They cover the most common intents for each vertical and include a large number of example utterances for each intent, with optional entity/slot annotations for ...
Bitext Synthetic Data - Media Streaming (Spanish language)
The Bitext Synthetic Data consist of pre-built training data for intent detection and are provided for 20 verticals for Spanish language (see ELRA-L0182 to ELRA-L0201). They cover the most common intents for each vertical and include a large number of example utterances for each intent, with optional entity/slot annotations for ...
Bitext Synthetic Data - Healthcare (Spanish language)
The Bitext Synthetic Data consist of pre-built training data for intent detection and are provided for 20 verticals for Spanish language (see ELRA-L0182 to ELRA-L0201). They cover the most common intents for each vertical and include a large number of example utterances for each intent, with optional entity/slot annotations for ...
Bitext Synthetic Data - Automotive (Spanish language)
The Bitext Synthetic Data consist of pre-built training data for intent detection and are provided for 20 verticals for Spanish language (see ELRA-L0182 to ELRA-L0201). They cover the most common intents for each vertical and include a large number of example utterances for each intent, with optional entity/slot annotations for ...
Bitext Synthetic Data - Healthcare (English language)
The Bitext Synthetic Data consist of pre-built training data for intent detection and are provided for 20 verticals for English language (see ELRA-L0162 to ELRA-L0181). They cover the most common intents for each vertical and include a large number of example utterances for each intent, with optional entity/slot annotations for ...
Bitext Synthetic Data - Travel (Spanish language)
The Bitext Synthetic Data consist of pre-built training data for intent detection and are provided for 20 verticals for Spanish language (see ELRA-L0182 to ELRA-L0201). They cover the most common intents for each vertical and include a large number of example utterances for each intent, with optional entity/slot annotations for ...
Bitext Synthetic Data - Wealth management (English language)
The Bitext Synthetic Data consist of pre-built training data for intent detection and are provided for 20 verticals for English language (see ELRA-L0162 to ELRA-L0181). They cover the most common intents for each vertical and include a large number of example utterances for each intent, with optional entity/slot annotations for ...
Bitext Synthetic Data - Insurance (Spanish language)
The Bitext Synthetic Data consist of pre-built training data for intent detection and are provided for 20 verticals for Spanish language (see ELRA-L0182 to ELRA-L0201). They cover the most common intents for each vertical and include a large number of example utterances for each intent, with optional entity/slot annotations for ...
Bitext Synthetic Data - Media Streaming (English language)
The Bitext Synthetic Data consist of pre-built training data for intent detection and are provided for 20 verticals for English language (see ELRA-L0162 to ELRA-L0181). They cover the most common intents for each vertical and include a large number of example utterances for each intent, with optional entity/slot annotations for ...
Bitext Synthetic Data - Mortgage and loans (English language)
The Bitext Synthetic Data consist of pre-built training data for intent detection and are provided for 20 verticals for English language (see ELRA-L0162 to ELRA-L0181). They cover the most common intents for each vertical and include a large number of example utterances for each intent, with optional entity/slot annotations for ...
Bitext Synthetic Data - Moving and storage (Spanish language)
The Bitext Synthetic Data consist of pre-built training data for intent detection and are provided for 20 verticals for Spanish language (see ELRA-L0182 to ELRA-L0201). They cover the most common intents for each vertical and include a large number of example utterances for each intent, with optional entity/slot annotations for ...
Bitext Synthetic Data - Retail Ecomm (Spanish language)
The Bitext Synthetic Data consist of pre-built training data for intent detection and are provided for 20 verticals for Spanish language (see ELRA-L0162 to ELRA-L0181). They cover the most common intents for each vertical and include a large number of example utterances for each intent, with optional entity/slot annotations for ...
Bitext Synthetic Data - Field Service (English language)
The Bitext Synthetic Data consist of pre-built training data for intent detection and are provided for 20 verticals for English language (see ELRA-L0162 to ELRA-L0181). They cover the most common intents for each vertical and include a large number of example utterances for each intent, with optional entity/slot annotations for ...
Bitext Synthetic Data - Hospitality (English language)
The Bitext Synthetic Data consist of pre-built training data for intent detection and are provided for 20 verticals for English language (see ELRA-L0162 to ELRA-L0181). They cover the most common intents for each vertical and include a large number of example utterances for each intent, with optional entity/slot annotations for ...
Bitext Synthetic Data - Mortgage and loans (Spanish language)
The Bitext Synthetic Data consist of pre-built training data for intent detection and are provided for 20 verticals for Spanish language (see ELRA-L0182 to ELRA-L0201). They cover the most common intents for each vertical and include a large number of example utterances for each intent, with optional entity/slot annotations for ...
Bitext Synthetic Data - Telecommunication (Spanish language)
The Bitext Synthetic Data consist of pre-built training data for intent detection and are provided for 20 verticals for Spanish language (see ELRA-L0182 to ELRA-L0201). They cover the most common intents for each vertical and include a large number of example utterances for each intent, with optional entity/slot annotations for ...
Bitext Lexical Dataset - Language Variants - Finnish
As a complement to the generic vocabulary provided in ELRA-L0141, language variants of Finnish are provided with the following features: Voice, Tense, Mood, Person, Number, Case, Degree, Pronominal Clitics, Formality. Variants are distributed as follows: - Finnish Standard: 74,000 lemmas / 74,000 forms - Finnish Colloquial: 71,000 lemmas / 22,600,000 ...
Bitext Lexical Dataset - Language Variants - English
As a complement to the generic vocabulary provided in ELRA-L0140, language variants of English are provided with the following features: Tense, Person, Number, Gender, Degree, Contraction. Variants are distributed as follows: - English US: 63,000 lemmas / 188,000 forms - English UK: 63,000 lemmas / 190,000 forms - English India: ...
Bitext Lexical Dataset - Chinese (Traditional)
The series of Bitext Lexical Datasets includes Lemmas, POS tagging, Frequency, Named Entities and Offensive features. Depending on the dataset and language, other syntactic and morphological features are also provided. The Bitext Lexical Dataset - Chinese (Traditional) consists of 75,000 lemmas (forms).
Bitext Lexical Dataset - Spanish
The series of Bitext Lexical Datasets includes Lemmas, POS tagging, Frequency, Named Entities and Offensive features. Depending on the dataset and language, other syntactic and morphological features are also provided. The Bitext Lexical Dataset - Spanish consists of 60,000 lemmas (2,500,000 forms) as well as the following extra features: Tense, ...
Bitext Lexical Dataset - Dutch
The series of Bitext Lexical Datasets includes Lemmas, POS tagging, Frequency, Named Entities and Offensive features. Depending on the dataset and language, other syntactic and morphological features are also provided. The Bitext Lexical Dataset - Dutch consists of 90,000 lemmas (500,000 forms) as well as the following extra features: Tense, ...
Bitext Lexical Dataset - Italian
The series of Bitext Lexical Datasets includes Lemmas, POS tagging, Frequency, Named Entities and Offensive features. Depending on the dataset and language, other syntactic and morphological features are also provided. The Bitext Lexical Dataset - Italian consists of 65,000 lemmas (1,400,000 forms) as well as the following extra features: Tense, ...
Bitext Lexical Dataset - Malay
The series of Bitext Lexical Datasets includes Lemmas, POS tagging, Frequency, Named Entities and Offensive features. Depending on the dataset and language, other syntactic and morphological features are also provided. The Bitext Lexical Dataset -Malay consists of 45,000 lemmas (120,000 forms) as well as the following extra features: Voice, Number, ...
Bitext Lexical Dataset - German
The series of Bitext Lexical Datasets includes Lemmas, POS tagging, Frequency, Named Entities and Offensive features. Depending on the dataset and language, other syntactic and morphological features are also provided. The Bitext Lexical Dataset - German consists of 100,000 lemmas (2,500,000 forms) as well as the following extra features: Tense, ...
Bitext Lexical Dataset - Language Variants - Dutch
As a complement to the generic vocabulary provided in ELRA-L0139, language variants of Dutch are provided with the following features: Tense, Mood, Person, Number, Gender, Contraction. Variants are distributed as follows: - Dutch Netherlands: 106,000 lemmas / 586,000 forms - Dutch Belgium: 97,000 lemmas / 591,000 forms
Bitext Lexical Dataset - Finnish
The series of Bitext Lexical Datasets includes Lemmas, POS tagging, Frequency, Named Entities and Offensive features. Depending on the dataset and language, other syntactic and morphological features are also provided. The Bitext Lexical Dataset - Finnish consists of 70,000 lemmas (80,000,000 forms) as well as the following extra features: Voice, ...
Bitext Lexical Dataset - Portuguese
The series of Bitext Lexical Datasets includes Lemmas, POS tagging, Frequency, Named Entities and Offensive features. Depending on the dataset and language, other syntactic and morphological features are also provided. The Bitext Lexical Dataset - Portuguese consists of 40,000 lemmas (3,500,000 forms) as well as the following extra features: Tense, ...
Bitext Lexical Dataset - Language Variants - Norwegian
As a complement to the generic vocabulary provided in ELRA-L0147, language variants of Norwegian are provided with the following features: Tense, Person, Number, Gender, Case, Degree, Definiteness. Variants are distributed as follows: - Norwegian (Bokmal): 45,000 lemmas / 500,000 forms - Norwegian (Nynorsk): 75,000 lemmas / 400,000 forms