Search and Browse – ELRA Catalogue

LC-STAR Standard Arabic Phonetic lexicon text

Arabic

ID: ELRA-S0247

The LC-STAR Standard Arabic Phonetic lexicon was created within the scope of the LC-STAR project (IST 2001-32216) which was sponsored by the European Commission. The lexicon comprises 110,271 entries, distributed over three categories: - a set of 52,981 common word entries. This set is extracte...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	21250.00 €	28000.00 €
Licence: Commercial Use - ELRA VAR	28000.00 €	28000.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	27625.00 €	36400.00 €
Licence: Commercial Use - ELRA VAR	36400.00 €	36400.00 €

"Le Monde Diplomatique" Arabic tagged corpus text

Arabic

ID: ELRA-W0049

ISLRN: 124-139-628-259-2

This corpus contains 102,960 vowelised, lemmatised and tagged words (58 texts from Le Monde Diplomatique Arabic, see also ELRA-W0036-04). To each text are associated 3 files : - raw text in Arabic, - vowelized text in Arabic, - one XML file containing the morphological annotation of the text. ...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	185.00 €	975.00 €
Licence: Commercial Use - ELRA VAR	975.00 €	975.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	400.00 €	2000.00 €
Licence: Commercial Use - ELRA VAR	2000.00 €	2000.00 €

"Le Monde Diplomatique" Text corpus in Arabic text

Arabic

ID: ELRA-W0036-04

ISLRN: 231-368-326-920-2

Electronic archiving of "Le Monde Diplomatique" articles in Arabic from 2000. The corpus is available in HTML. Each HTML file contains one article. Number of articles available per year : • 2000: 61 articles (November and December available only) (75,305 words) • 2001: 346 articles (479,435 ...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	46.00 €	46.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	69.00 €	69.00 €

MADED (Moroccan Arabic Dialect Electronic Dictionary) text

Arabic

ID: ELRA-L0134

ISLRN: 977-057-254-691-5

Moroccan Arabic Dialect Electronic Dictionary (MADED) is an electronic lexicon containing almost 11,500 entries. They are written in Arabic script wherein each Modern Standard Arabic (MSA) lemma is provided with its corresponding Moroccan Arabic equivalent. In addition, MADED entries are annotate...

MEMBER	academic	commercial
Licence: Attribution, Non Commercial Use, No Derivatives - CC-BY-NC-ND	0.00 €	0.00 €
Licence: Commercial Use - ELRA VAR	1000.00 €	1000.00 €

NON MEMBER	academic	commercial
Licence: Attribution, Non Commercial Use, No Derivatives - CC-BY-NC-ND	0.00 €	0.00 €
Licence: Commercial Use - ELRA VAR	2000.00 €	2000.00 €

MORV (Moroccan Morphological vocabulary) text

Arabic

ID: ELRA-L0135

ISLRN: 064-194-729-767-0

The Moroccan Morphological vocabulary is a lexicon containing more than 4.6 M entries describing a given Moroccan Arabic word with fourteen (14) morphological and semantic features: the word orthographic form, the segmentation (prefix and suffix), part-of-speech (POS), gender, number, tense and t...

MEMBER	academic	commercial
Licence: Attribution, Non Commercial Use, No Derivatives - CC-BY-NC-ND	0.00 €	0.00 €
Licence: Commercial Use - ELRA VAR	6000.00 €	6000.00 €

NON MEMBER	academic	commercial
Licence: Attribution, Non Commercial Use, No Derivatives - CC-BY-NC-ND	0.00 €	0.00 €
Licence: Commercial Use - ELRA VAR	12000.00 €	12000.00 €

NE3L named entities Arabic corpus text

Arabic

ID: ELRA-W0078

ISLRN: 398-979-151-557-0

The NE3L project (Named Entities 3 Languages) consisted in annotating several corpora with different languages with named entities. Text format data were extracted from newspapers and deal with various topics. 3 different languages were annotated: Arabic, Chinese and Russian. For this project, 5...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	5000.00 €	5000.00 €
Licence: Commercial Use - ELRA VAR	5000.00 €	5000.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	5000.00 €	5000.00 €
Licence: Commercial Use - ELRA VAR	5000.00 €	5000.00 €

NEMLAR Written Corpus text

Arabic

ID: ELRA-W0042

ISLRN: 050-693-158-326-9

This corpus was produced within the NEMLAR project (http://www.nemlar.org). Two other resources, produced within the same project, are also available: NEMLAR Broadcast News Speech Corpus (ELRA-S0219) and the NEMLAR Speech Synthesis Corpus (ELRA-S0220). The NEMLAR Written Corpus consists of about...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	150.00 €	250.00 €
Licence: Commercial Use - ELRA VAR	1000.00 €	1000.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	300.00 €	500.00 €
Licence: Commercial Use - ELRA VAR	2000.00 €	2000.00 €

Special offers are also available. Check here for details.

Normalized Arabic Fragments for Inestimable Stemming (NAFIS) text

Arabic

ID: ELRA-W0127

ISLRN: 305-450-745-774-1

Normalized Arabic Fragments for Inestimable Stemming (NAFIS) is an Arabic stemming gold standard corpus composed by a collection of sentences, selected to be representative of Arabic stemming tasks and manually annotated. Indeed, NAFIS is: Comprehensive: The content of NAFIS can be generalized...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	0.00 €
Licence: Commercial Use - ELRA VAR	0.00 €	0.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	0.00 €
Licence: Commercial Use - ELRA VAR	0.00 €	0.00 €

Training and test data for Arabizi detection and transliteration text

Arabic
English

ID: ELRA-W0126

ISLRN: 986-364-744-303-9

The dataset is composed of two distinct resources: 1) A collection of mixed English and Arabizi text intended to train and test a system for the automatic detection of code-switching in mixed English and Arabizi texts. The training part of the corpus contains: 522 tweets composed of 5,207 token...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	500.00 €
Licence: Commercial Use - ELRA VAR	500.00 €	500.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	650.00 €
Licence: Commercial Use - ELRA VAR	650.00 €	650.00 €

Wojood - A corpus for nested Arabic Named Entity Recognition text

Arabic

ID: ELRA-W0325

ISLRN: 688-718-284-176-0

Wojood consists of about 550,000 tokens (Modern Standard Arabic and dialect) that are manually annotated with 21 entity types (person, group of people, occupation, organization, geopolitical entity, location, facility, event, date, time, language, website, law, product, cardinal number, ordinal n...

MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	8000.00 €
Licence: Commercial Use - ELRA VAR	8000.00 €	8000.00 €

NON MEMBER	academic	commercial
Licence: Non Commercial Use - ELRA END USER	0.00 €	10000.00 €
Licence: Commercial Use - ELRA VAR	10000.00 €	10000.00 €

Corpus:
Lexical/Conceptual:
Tool/Service:
Language Description:

Text:
Audio:
Image:
Video:
Text Numerical:
Text N-Gram:

Resource Type:

Media Type:

30 Language Resources (Page 2 of 2)