Armenian Language Data for Software Developers
Overview
The Nayiri Institute aims to serve as a catalyst for the development of meaningful software in the Armenian language. To support this mission, we are committed to providing high-quality, open linguistic resources that can serve as the foundation for modern Armenian-language applications.
Nayiri Armenian Lexicon
The Nayiri Armenian Lexicon is a large-scale, structured lexical data set designed for computational use. It models Armenian vocabulary across multiple orthographies and grammatical systems, capturing lexemes, lemmas, word forms, and a shared set of inflection objects in a consistent, machine-readable format.
The Lexicon is intended to support a wide range of applications, including search, spell-checking, natural language processing, and linguistic research.
By publishing the data in an open, well-documented format, we aim to lower the barrier to entry for developers building Armenian-language tools and services.
Nayiri Armenian Text Corpus
The Nayiri Armenian Text Corpus complements the Lexicon by providing a curated collection of Armenian texts suitable for language modeling, linguistic analysis, and machine-learning workflows.
The corpus is designed to help developers train, evaluate, and improve Armenian-language systems—from tokenization and morphological analysis to modern AI and language-model applications.
Together, the Lexicon and Corpus form a foundational dataset intended to accelerate the growth of a robust Armenian-language software ecosystem.
Sponsorship
The design, creation, and open-source release of the Nayiri Armenian Lexicon and the Corpus of Western Armenian has been supported by the Calouste Gulbenkian Foundation.
