Nayiri Armenian Lexicon — File Format
The lexicon is provided as a single file (in UTF-8 format) with a single line containing the complete unindented JSON object with no extra white-spaces.
To illustrate the structure, consider the following simplified example. This sample contains one Lexeme (զսպումն), which includes a single Lemma (զսպումն). That lemma, in turn, contains two Word Forms (զսպումէն and զսպումով). Two Inflections are defined globally.
In the actual dataset, the scale is much larger: thousands of Lexemes, several Lemmas per Lexeme, dozens—or even hundreds—of Word Forms per Lemma, and a large number of associated Inflection objects.
For readability, we provide the sample JSON data formatted across multiple lines and indented. To minimize file size, the actual full data is on one line and is not indented.
Note that Inflection objects are defined only once at the global level and re-used across the Word Forms of all Lemmas.
{
"lexemes": [
{
"lexemeId": "4ZXN",
"description": "զսպումն (repression, coercion, restraint, control, repression, suppression)",
"lemmaType": "NOMINAL",
"lemmas": [
{
"lemmaId": "4ZXN5",
"lemmaString": "զսպումն",
"partOfSpeech": "NOUN",
"numWordForms": 2,
"wordForms": [
{
"s": "զսպումէն",
"i": "AASA"
},
{
"s": "զսպումով",
"i": "AACg"
}
]
}
]
}
],
"inflections": [
{
"inflectionId": "AASA",
"lemmaType": "NOMINAL",
"displayName": {
"hy": "Եզակի • Բացառական հոլով • Որոշիչ յօդ «ն»",
"en": "Singular • Ablative case • Definite Article ն"
},
"grammaticalNumber": "SINGULAR",
"grammaticalCase": "ABLATIVE",
"grammaticalArticle": "DEFINITE_ARTICLE_NOO"
},
{
"inflectionId": "AACg",
"lemmaType": "NOMINAL",
"displayName": {
"hy": "Եզակի • Գործիական հոլով",
"en": "Singular • Instrumental case"
},
"grammaticalNumber": "SINGULAR",
"grammaticalCase": "INSTRUMENTAL",
"grammaticalArticle": "NONE"
}
],
"metadata": {
"version": "2026-02-07-v1",
"attribution": "Nayiri Armenian Lexicon © Serouj Ourishian, licensed under CC BY 4.0.",
"license": "Creative Commons Attribution 4.0 (CC BY 4.0)",
"publisher": "Nayiri Institute for Armenian Language Computing",
"sponsorship": "with the sponsorship of the Calouste Gulbenkian Foundation",
"numLexemes": 1,
"numLemmas": 1,
"numWordForms": 2,
"numInflections": 2
}
}
Now that you’re familiar with the overall JSON structure, you can explore the larger sample Lexicon file, which includes 20 Lexemes and all 700+ Inflection objects. The file is still small enough to open comfortably in a text editor.
We recommend downloading the indented sample file from the Download section for exploration and learning, and using the minified (non-indented) version during application development.