58 Language Resources (Page 1 of 3)

« Previous | Next »Order by:

 Amaryllis Corpus - Evaluation Package    
  • French

ID: ELRA-W0029

ISLRN: 786-395-313-491-8

Launched at the end of 1995, the AMARYLLIS project aimed at evaluating information retrieval software for French text corpora in order to provide a methodology for the evaluation of other similar tools. AMARYLLIS was organised by the Institut de l'Information Scientifique et Technique (INIST) wit...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
45.00 € submit
100.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
45.00 € submit
100.00 € submit
 Arabic Morphological Dictionary    
  • Arabic

ID: ELRA-L0088

ISLRN: 472-591-121-577-5

The Arabic Morphological Dictionary contains 4,912,749 entries, including: - 3,374,852 nouns, - 1,537,699 verbs, - 198 grammatical words. The dictionary is stored on 1 CD. All files are provided as plain text in UTF8 character encoding, which represents about 154 Mb of data. The dictionary form...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
250.00 € submit
6000.00 € submit
Licence: Commercial Use - ELRA VAR
6000.00 € submit
6000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
450.00 € submit
12000.00 € submit
Licence: Commercial Use - ELRA VAR
12000.00 € submit
12000.00 € submit
 Bulgarian Morphological Dictionary    
  • Bulgarian

ID: ELRA-L0030

ISLRN: 611-552-122-892-7

This dictionary contains 67500 entries divided into 242 inflectional types (including proper nouns), morphosyntactic information for each entry, and a morphological engine (MS DOS and WINDOWS 95/NT) for morphological analysis and generation. The data may be used for morphological analysis and syn...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
45.00 € submit
6000.00 € submit
Licence: Commercial Use - ELRA VAR
6000.00 € submit
6000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
100.00 € submit
12000.00 € submit
Licence: Commercial Use - ELRA VAR
12000.00 € submit
12000.00 € submit
 CRATER corpus    
  • English
  • French
  • Spanish; Castilian

ID: ELRA-W0003

ISLRN: 645-721-607-031-5

The Corpus Resources and Terminology Extraction project (MLAP-93 20) has extended the bilingual annotated English-French International Telecommunications Union corpus to include Spanish, and has also debugged the existing corpus. The offer consists of a multi-lingual aligned corpus of 1,000,000 t...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
20.00 € submit
Licence: Commercial Use - ELRA VAR
20.00 € submit
20.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
100.00 € submit
Licence: Commercial Use - ELRA VAR
100.00 € submit
100.00 € submit
 Dutch PAROLE lexicon    
  • Dutch; Flemish

ID: ELRA-L0031

ISLRN: 283-192-505-981-6

The entry list of the lexicon consists of about 20,200 entries distributed over 13 parts of speech (POS). The entries have been described along the dimensions of morphosyntax and syntax. Morphosyntactic information consists of various lexical properties, like gender, number, case, person, inflect...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
300.00 € submit
1600.00 € submit
Licence: Commercial Use - ELRA VAR
8000.00 € submit
8000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
400.00 € submit
3000.00 € submit
Licence: Commercial Use - ELRA VAR
10000.00 € submit
10000.00 € submit

Special offers are also available. Check here for details.

 Ema-lon Manipuri Corpus (including word embedding and language model)    
  • English
  • Manipuri

ID: ELRA-W0316

ISLRN: 588-170-827-016-7

The Ema-lon Manipuri Corpus consists of a set of resources for Manipuri language (locally known as Meiteilon) for the purpose of machine translation. The main source for these resources is the Sangai Express news website. The resources that constitute the present corpus are listed below: 1. EM C...

MEMBERacademiccommercial
Licence: Attribution, Non Commercial Use - CC-BY-NC-4.0
0.00 € submit
0.00 € submit
NON MEMBERacademiccommercial
Licence: Attribution, Non Commercial Use - CC-BY-NC-4.0
0.00 € submit
0.00 € submit
 English-Nepali Parallel Corpus    
  • English
  • Nepali (macrolanguage)

ID: ELRA-W0077

ISLRN: 853-487-663-161-6

The Nepali Monolingual written corpus is one of the 3 resources that constitute the Nepali National Corpus. The Nepali National Corpus was produced in 2006 in the framework of the project Bhasha Sanchar (“language communication”), also known as Nelralec, for Nepali Language Resources and Localiza...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
0.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
0.00 € submit
 English-Punjabi Code-Mixed Social Media Content    
  • English
  • Panjabi; Punjabi

ID: ELRA-W0319

ISLRN: 695-759-706-170-8

The English-Punjabi Code-Mixed Social Media Content corpus is composed is composed of 893,615 parallel sentences of English-Punjabi distributed over the following domains: - 82,341 parallel sentences of English-Punjabi code-mixed Agriculture Domain Data, - 59,158 parallel sentences of English-P...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
0.00 € submit
Licence: Commercial Use - ELRA VAR
0.00 € submit
0.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
0.00 € submit
Licence: Commercial Use - ELRA VAR
0.00 € submit
0.00 € submit
 English-Vietnamese Parallel Corpus    
  • English
  • Vietnamese

ID: ELRA-W0124

ISLRN: 838-483-738-912-8

This is a corpus of 500,000 English-Vietnamese sentence pairs, built to develop SMT (Statistical Machine Translation) systems. The parallel corpus contains English documents translated by professional translators into Vietnamese. The source texts include books, dictionaries, newspapers, online ne...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
600.00 € submit
1200.00 € submit
Licence: Commercial Use - ELRA VAR
6000.00 € submit
6000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
1000.00 € submit
2000.00 € submit
Licence: Commercial Use - ELRA VAR
8000.00 € submit
8000.00 € submit
 euLEX (Lexical Database for Basque)    
  • Basque

ID: ELRA-L0085

ISLRN: 593-049-611-011-8

euLEX is a general lexicon which contains 115,000 entries, divided into 94,000 dictionary entries or lemmas, 12,000 allomorphs, 7,500 verb forms and about 1,200 dependent morphemes. All entries include linguistic information such as morphology and usage. The lexicon includes general purpose entr...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
3000.00 € submit
6000.00 € submit
Licence: Commercial Use - ELRA VAR
15000.00 € submit
15000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
6000.00 € submit
10000.00 € submit
Licence: Commercial Use - ELRA VAR
20000.00 € submit
20000.00 € submit
 ILC Italian Morphological Lexicon    
  • Italian

ID: ELRA-L0006

ISLRN: 965-829-467-456-4

The ILC Italian Morphological Lexicon consists of a set of lemmas/lexical entries (about 60,000) with the corresponding inflected word-forms, and a morphological engine for morphological analysis and generation. Lemmas and word-forms are encoded with grammatical codes compatible with the EAGLES r...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
4000.00 € submit
12000.00 € submit
Licence: Commercial Use - ELRA VAR
12000.00 € submit
12000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
8000.00 € submit
20000.00 € submit
Licence: Commercial Use - ELRA VAR
20000.00 € submit
20000.00 € submit
 ILE: Italian LExicon      
  • Italian

ID: ELRA-S0059

ISLRN: 052-156-999-928-3

ILE is a 588,000 entries Italian lexicon transcribed with SAMPA notation. It was generated, mainly for speech recognition purposes, by means of a morphological analyzer handling more than 100,000 morphemes, each of them transcribed and manually checked. Each stem was combined with all its possibl...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
3000.00 € submit
12000.00 € submit
Licence: Commercial Use - ELRA VAR
12000.00 € submit
12000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
6000.00 € submit
18000.00 € submit
Licence: Commercial Use - ELRA VAR
18000.00 € submit
18000.00 € submit
 Italian Syntactic-Semantic Treebank (ISST)    
  • Italian

ID: ELRA-W0044

ISLRN: 927-246-660-947-9

ISST comprises 89,941 tokens for the financial-domain part and 215,606 tokens for the general part. It is formatted in XML. ISST has a five-level structure covering orthographic, morpho-syntactic, syntactic and semantic levels of linguistic description. Syntactic annotation is distributed over t...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
100.00 € submit
1500.00 € submit
Licence: Commercial Use - ELRA VAR
1500.00 € submit
1500.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
150.00 € submit
2500.00 € submit
Licence: Commercial Use - ELRA VAR
2500.00 € submit
2500.00 € submit
 ItalWordNet (Italian WordNet)    
  • English
  • Italian

ID: ELRA-M0042

ISLRN: 532-206-426-067-2

ItalWordNet (Italian WordNet) is an updated version of the EuroWordNet Italian database. The ItalWordNet database was produced within a national Italian programme called SI-TAL. It contains a total of 49,360 synsets. Unlike the EuroWordNet database, the ItalWordNet is provided in XML format. Howe...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
400.00 € submit
4000.00 € submit
Licence: Commercial Use - ELRA VAR
6000.00 € submit
6000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
800.00 € submit
8000.00 € submit
Licence: Commercial Use - ELRA VAR
12000.00 € submit
12000.00 € submit
 KORLEX – Croatian Lexicon    
  • Croatian

ID: ELRA-L0065

ISLRN: 457-664-833-687-1

This lexical resource was developed as part of the bilingual lexicon for English-Croatian built for the following project: http://www.rjecnik.com. The lexicon data is compiled with the objective of covering the majority of text circulating in everyday use, such as in the news (e.g., newswire art...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
1000.00 € submit
2000.00 € submit
Licence: Commercial Use - ELRA VAR
2000.00 € submit
2000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
2000.00 € submit
5000.00 € submit
Licence: Commercial Use - ELRA VAR
5000.00 € submit
5000.00 € submit
 LABEL-LEX (MW)    
  • Portuguese

ID: ELRA-L0054

ISLRN: 502-837-497-805-9

LABEL-LEX (MW) is a Portuguese formalized lexicon, containing 88 619 inflected multiword lexical units (formally, sequences of simple words). The units are distributed as follows: - 85,881 nouns, with information about type, gender, number, inflected forms, irregular inflected forms and subcatego...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
3000.00 € submit
10000.00 € submit
Licence: Commercial Use - ELRA VAR
10000.00 € submit
10000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
5000.00 € submit
15000.00 € submit
Licence: Commercial Use - ELRA VAR
15000.00 € submit
15000.00 € submit
 LABEL-LEX (SW)    
  • Portuguese

ID: ELRA-L0055

ISLRN: 154-511-437-811-6

LABEL-LEX (SW) is a Portuguese formalized lexicon, containing 1,545,481 simple inflected words. The words are distributed as follows: - 142,236 nouns, with information about type, gender, number, inflected forms, and irregular inflected forms - 3,155 adverbs, with information about degree, polari...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
2500.00 € submit
10000.00 € submit
Licence: Commercial Use - ELRA VAR
10000.00 € submit
10000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
5000.00 € submit
15000.00 € submit
Licence: Commercial Use - ELRA VAR
15000.00 € submit
15000.00 € submit
 "La Dépêche de Kabylie" Corpus    
  • Berber languages

ID: ELRA-W0322

ISLRN: 176-700-464-150-5

"La Dépêche de Kabylie" Corpus consists of about 1,570,000 words in Amazigh language collected from the Algerian newspaper entitled “La Dépêche de Kabylie”. It was collected thanks to HTTrack Website Copier and contains about 90% of all entries of the Amazigh language. All articles are gathered u...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
100.00 € submit
Licence: Commercial Use - ELRA VAR
100.00 € submit
100.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
150.00 € submit
Licence: Commercial Use - ELRA VAR
150.00 € submit
150.00 € submit
 Macedonian lexicon of compound words (MACPLEX_COMP)    
  • Macedonian

ID: ELRA-L0093

ISLRN: 094-300-240-291-0

MACPLEX_COMP contains 784 lemmas and 6,289 word forms (576 nouns, 25 adjectives, 73 adverbs, 66 interjections, 17 numerals, 15 pronouns and 12 residuals). The lexicon is available in Unicode.

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
20.00 € submit
30.00 € submit
Licence: Commercial Use - ELRA VAR
70.00 € submit
70.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
25.00 € submit
35.00 € submit
Licence: Commercial Use - ELRA VAR
90.00 € submit
90.00 € submit

This resource is also available in a bundle. Check here for bundled pricing.

 Macedonian lexicon of derived adjectives (MACPLEX_ADJDERV)    
  • Macedonian

ID: ELRA-L0091

ISLRN: 249-051-045-468-2

MACPLEX_ADJDERV contains 12,073 lemmas and 281,488 word forms (10,233 with suffix –чки, 1,840 with suffix –билен). The lexicon is available in Unicode.

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
270.00 € submit
400.00 € submit
Licence: Commercial Use - ELRA VAR
1100.00 € submit
1100.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
340.00 € submit
550.00 € submit
Licence: Commercial Use - ELRA VAR
1400.00 € submit
1400.00 € submit

This resource is also available in a bundle. Check here for bundled pricing.

« Previous | Next »