Nayiri Developers: Armenian Language Data for Software Developers

Armenian Language Data for Software Developers

Overview

The Nayiri Institute aims to serve as a catalyst for the development of meaningful software in the Armenian language. To support this mission, we are committed to providing high-quality, open linguistic resources that can serve as the foundation for modern Armenian-language applications.

Nayiri Armenian Lexicon

The Nayiri Armenian Lexicon is a large-scale, structured lexical data set designed for computational use. It models Armenian vocabulary across multiple orthographies and grammatical systems, capturing lexemes, lemmas, word forms, and a shared set of inflection objects in a consistent, machine-readable format.

The Lexicon is intended to support a wide range of applications, including search, spell-checking, natural language processing, and linguistic research.

By publishing the data in an open, well-documented format, we aim to lower the barrier to entry for developers building Armenian-language tools and services.

Get Started with the Nayiri Armenian Lexicon

Nayiri Armenian Corpus Data

The Nayiri Armenian Text Corpus complements the Lexicon by providing a curated collection of Armenian texts suitable for language modeling, linguistic analysis, and machine-learning workflows.

The corpus is designed to help developers train, evaluate, and improve Armenian-language systems—from tokenization and morphological analysis to modern AI and language-model applications.

Together, the Lexicon and Corpus form a foundational dataset intended to accelerate the growth of a robust Armenian-language software ecosystem.

Get Started with the Nayiri Armenian Corpus Data

Sponsorship

The design, creation, and open-source release of the Nayiri Armenian Lexicon and the Corpus of Western Armenian Dataset has been supported by the Calouste Gulbenkian Foundation.

with the sponsorship of the Calouste Gulbenkian Foundation