Resource Type:

Corpus:
Lexical/Conceptual:
Tool/Service:
Language Description:

Media Type:

Text:
Audio:
Image:
Video:
Text Numerical:
Text N-Gram:

9 Language Resources

Order by:

 Al-Hayat Arabic Corpus    
  • Arabic

ID: ELRA-W0030

ISLRN: 365-777-769-398-7

The corpus was developed in the course of a research project at the University of Essex, in collaboration with the Open University. The corpus contains Al-Hayat newspaper articles with value added for Language Engineering and Information Retrieval applications development purposes. The data have ...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
480.00 € submit
960.00 € submit
Licence: Commercial Use - ELRA VAR
960.00 € submit
960.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
720.00 € submit
1440.00 € submit
Licence: Commercial Use - ELRA VAR
1440.00 € submit
1440.00 € submit
 An-Nahar Newspaper Text Corpus    
  • Arabic

ID: ELRA-W0027

ISLRN: 083-457-618-309-8

The An-Nahar Lebanon Newspaper Text Corpus comprises articles in standard Arabic from 1995 to 2000 (6 years) stored as HTML files on CDRom media. Each year contains 45 000 articles and 24 million words. Each article includes information such as title, newspaper's name, date, country, type, page, ...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
2016.00 € submit
3192.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
3024.00 € submit
4788.00 € submit

Special offers are also available. Check here for details.

 "Le Monde Diplomatique" Arabic tagged corpus    
  • Arabic

ID: ELRA-W0049

ISLRN: 124-139-628-259-2

This corpus contains 102,960 vowelised, lemmatised and tagged words (58 texts from Le Monde Diplomatique Arabic, see also ELRA-W0036-04). To each text are associated 3 files : - raw text in Arabic, - vowelized text in Arabic, - one XML file containing the morphological annotation of the text. ...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
185.00 € submit
975.00 € submit
Licence: Commercial Use - ELRA VAR
975.00 € submit
975.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
400.00 € submit
2000.00 € submit
Licence: Commercial Use - ELRA VAR
2000.00 € submit
2000.00 € submit
 "Le Monde Diplomatique" Text corpus in Arabic    
  • Arabic

ID: ELRA-W0036-04

ISLRN: 231-368-326-920-2

Electronic archiving of "Le Monde Diplomatique" articles in Arabic from 2000. The corpus is available in HTML. Each HTML file contains one article. Number of articles available per year : • 2000: 61 articles (November and December available only) (75,305 words) • 2001: 346 articles (479,435 ...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
46.00 € submit
46.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
69.00 € submit
69.00 € submit
 MEDAR Evaluation Package    
  • Arabic
  • English

ID: ELRA-E0040

ISLRN: 631-407-723-040-2

The MEDAR Evaluation Package was produced within the project MEDAR (MEDiterranean ARabic language and speech technology), supported by the European Commission's ICT programme and which has been running from February 1st 2008 until July 31st 2010. The project addressed International Cooperation be...

MEMBERacademiccommercial
Licence: Evaluation Use - ELRA EVALUATION
0.00 € submit
0.00 € submit
NON MEMBERacademiccommercial
Licence: Evaluation Use - ELRA EVALUATION
0.00 € submit
0.00 € submit
 NE3L named entities Arabic corpus    
  • Arabic

ID: ELRA-W0078

ISLRN: 398-979-151-557-0

The NE3L project (Named Entities 3 Languages) consisted in annotating several corpora with different languages with named entities. Text format data were extracted from newspapers and deal with various topics. 3 different languages were annotated: Arabic, Chinese and Russian. For this project, 5...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
5000.00 € submit
5000.00 € submit
Licence: Commercial Use - ELRA VAR
5000.00 € submit
5000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
5000.00 € submit
5000.00 € submit
Licence: Commercial Use - ELRA VAR
5000.00 € submit
5000.00 € submit
 NEMLAR Written Corpus    
  • Arabic

ID: ELRA-W0042

ISLRN: 050-693-158-326-9

This corpus was produced within the NEMLAR project (http://www.nemlar.org). Two other resources, produced within the same project, are also available: NEMLAR Broadcast News Speech Corpus (ELRA-S0219) and the NEMLAR Speech Synthesis Corpus (ELRA-S0220). The NEMLAR Written Corpus consists of about...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
150.00 € submit
250.00 € submit
Licence: Commercial Use - ELRA VAR
1000.00 € submit
1000.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
300.00 € submit
500.00 € submit
Licence: Commercial Use - ELRA VAR
2000.00 € submit
2000.00 € submit

Special offers are also available. Check here for details.

 Normalized Arabic Fragments for Inestimable Stemming (NAFIS)    
  • Arabic

ID: ELRA-W0127

ISLRN: 305-450-745-774-1

Normalized Arabic Fragments for Inestimable Stemming (NAFIS) is an Arabic stemming gold standard corpus composed by a collection of sentences, selected to be representative of Arabic stemming tasks and manually annotated. Indeed, NAFIS is: Comprehensive: The content of NAFIS can be generalized...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
0.00 € submit
Licence: Commercial Use - ELRA VAR
0.00 € submit
0.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
0.00 € submit
Licence: Commercial Use - ELRA VAR
0.00 € submit
0.00 € submit
 Training and test data for Arabizi detection and transliteration    
  • Arabic
  • English

ID: ELRA-W0126

ISLRN: 986-364-744-303-9

The dataset is composed of two distinct resources: 1) A collection of mixed English and Arabizi text intended to train and test a system for the automatic detection of code-switching in mixed English and Arabizi texts. The training part of the corpus contains: 522 tweets composed of 5,207 token...

MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
500.00 € submit
Licence: Commercial Use - ELRA VAR
500.00 € submit
500.00 € submit
NON MEMBERacademiccommercial
Licence: Non Commercial Use - ELRA END USER
0.00 € submit
650.00 € submit
Licence: Commercial Use - ELRA VAR
650.00 € submit
650.00 € submit