Nayiri for Developers

Nayiri Armenian Lexicon — Data Schema Reference

The lexicon data is provided in JSON format. The following sections describe the attributes of each JSON object.

Root-level

The root-level JSON Object has the following attributes:
Attribute Key Type Description
lexemes JSON Array An Array of Lexeme objects (described below)
inflections JSON Array An Array of globally-defined Inflection objects (described below) shared by Word Forms
metadata JSON Object Provides an overview of the data set, including versioning, licensing, and some basic statistics.

Lexeme

Attribute Key Type Description
lexemeId String The 4-digit identifier that uniquely identifies this Lexeme in the Nayiri Lexicon.

It is the base64url encoding of the underlying 24-bit unique identifier.
description String A human-readable description of this Lexeme that by convention has a comma-separated list of lemmas and a short English definition in parantheses, meant to provide a way to disambiguate distinct Lexemes with the same Lemma.

For example, the Lexeme representing the postposition համար has the description “համար (for, on account of)”, whereas the Lexeme representing the noun համար has the description “համար (account, number, count, calculation, enumeration)”.
lemmaType String The type of Lemmas this Lexeme can contain.

One of: { NOMINAL, VERBAL, UNINFLECTED }

Lexemes with lemmaType of NOMINAL can store Lemmas that represent Nouns, Adjectives, Adverbs, and Adpositions

Lexemes with lemmaType of VERBAL are meant to store Lemmas of Verbs only

Lexemes with lemmaType of UNINFLECTED are meant to store Lexemes that are exclusively uninflected, such as adverbs (e.g. անմիջապէս), adpositions, conjunctions, interjections, articles, and determiners.
lemmas JSON Array The Lemma objects belonging to this Lexeme.

Lemma

Attribute Key Type Description
lemmaId String The 5-digit identifier that uniquely identifies this Lemma in the Nayiri Lexicon.

It is the base64url encoding of the underlying 30-bit unique identifier.
lemmaString String The canonical word form of this Lemma. (For example, ճշդել)

There may be more than one Lemma with the same lemmaString in a given Lexeme.

For example, in the Uninflected Lexeme with the description “որ (that; when, whenever; if; so that, in order to)”, the two contained Uninflected Lemmas for the conjunction and adverb both have the same lemmaString (որ).
partOfSpeech String The part of speech of this lemma, which is one of:

{ NOUN, PRONOUN, VERB, ADJECTIVE, ADVERB, CONJUNCTION, INTERJECTION, ARTICLE, DETERMINER, ADPOSITION }
lemmaDisplayString String A human-readable description of this Lemma. By convention, it is the lemmaString followed by an English definition in parentheses. It is meant to provide a way to disambiguate Lemmas with the same lemmaString within the same Lexeme.

In the preceding example, the Uninflected Lemma for the conjunction որ has the lemmaDisplayString “որ (that; if; so that, In order to)”, where as the adverb has “որ (when, whenever)”.
numWordForms Integer A convenience attribute showing the total number of Word Forms in this Lemma.
wordForms JSON Array The WordForm objects attributed to this Lemma.

Word Form

Attribute Key Type Description
s String An inflected word form (e.g. ճշդեմ, ճշդես, ճշդէ) of the containing Lemma (e.g. ճշդել)
i String The unique identifier of the Inflection object representing the morphological analysis of this Word Form.

Inflection

Attribute Key Type Description
inflectionId String The 4-digit unique identifier of this Inflection object.

It is the base64url encoding of the underlying 24-bit unique identifier of this Inflection object.
lemmaType String One of: { NOMINAL, VERBAL, UNINFLECTED }

Note that no attributes besides inflectionId and displayName apply to the special Inflection object with lemmaType == UNINFLECTED
displayName JSON Object Provides an internationalized human-readable display name for this Inflection.

The keys are the locale, and the values are the localized display names. Both the keys and values are Strings.

At present, only the hy (Armenian) and en (English) locale Strings are supported.
verbalInflectionClass String Signifies the broad category of Verbal Inflections represented by this Inflection object.

Applicable only when lemmaType == VERBAL

One of: { REGULAR_VERB, INFINITIVE, PRESENT_PARTICIPLE, PAST_PARTICIPLE, FUTURE_PARTICIPLE, PRESENT_PARTICIPLE_SUBSTANTIVE, PAST_PARTICIPLE_SUBSTANTIVE, FUTURE_PARTICIPLE_SUBSTANTIVE }
verbPolarity String Signifies the polarity of the verb for this Inflection.

Applicable only when lemmaType == VERBAL

One of: { POSITIVE, NEGATIVE }
verbTense String Signifies the grammatical tense of the verb for this Inflection.

Applicable only when verbalInflectionClass == REGULAR_VERB

One of: { SIMPLE_PRESENT, PRESENT_CONTINUOUS, PRESENT_PERFECT, SIMPLE_PAST, PAST_PERFECT, PAST_IMPERFECT, PAST_CONTINUOUS, SIMPLE_FUTURE, FUTURE_PERFECT, NONE }
verbMood String Signifies the grammatical mood of the verb for this Inflection.

Applicable only when verbalInflectionClass == REGULAR_VERB

One of: { INDICATIVE, IMPERATIVE, PROHIBITIVE, SUBJUNCTIVE, CONDITIONAL }
grammaticalPerson String Signifies the grammatical person of the verb for this Inflection.

Applicable only when verbalInflectionClass == REGULAR_VERB

One of: { FIRST, SECOND, THIRD, NONE }
grammaticalNumber String Signifies the grammatical number of the noun, verb, or substantive participle for this Inflection.

Applicable only when (lemmaType == NOMINAL) || (verbalInflectionClass == (REGULAR_VERB || PRESENT_PARTICIPLE_SUBSTANTIVE || PAST_PARTICIPLE_SUBSTANTIVE || FUTURE_PARTICIPLE_SUBSTANTIVE)

One of: { SINGULAR, PLURAL }
grammaticalCase String Signifies the grammatical number of the noun, infinitive, or substantive participle for this Inflection.

Applicable only when (lemmaType == NOMINAL) || (verbalInflectionClass == (INFINITIVE || PRESENT_PARTICIPLE_SUBSTANTIVE || PAST_PARTICIPLE_SUBSTANTIVE || FUTURE_PARTICIPLE_SUBSTANTIVE))

One of: { NOMINATIVE, ACCUSATIVE, GENITIVE, DATIVE, ABLATIVE, INSTRUMENTAL, LOCATIVE }
grammaticalArticle String Signifies any grammatical article appended to the noun, infinitive, or substantive participle for this Inflection.

Applicable only when (lemmaType == NOMINAL ) || (verbalInflectionClass == (INFINITIVE || PRESENT_PARTICIPLE_SUBSTANTIVE || PAST_PARTICIPLE_SUBSTANTIVE || FUTURE_PARTICIPLE_SUBSTANTIVE))

One of: { NONE, DEFINITE_ARTICLE_UHT, DEFINITE_ARTICLE_NOO, POSSESSIVE_ARTICLE_SINGULAR_FIRST_PERSON, POSSESSIVE_ARTICLE_SINGULAR_SECOND_PERSON, POSSESSIVE_ARTICLE_UHT, POSSESSIVE_ARTICLE_NOO, DEFINITE_ARTICLE_NOO_WITH_FIRST_PERSON_POSSESSIVE_ARTICLE, DEFINITE_ARTICLE_NOO_WITH_SECOND_PERSON_POSSESSIVE_ARTICLE, DEFINITE_ARTICLE_NOO_WITH_THIRD_PERSON_POSSESSIVE_ARTICLE_UHT, DEFINITE_ARTICLE_NOO_WITH_THIRD_PERSON_POSSESSIVE_ARTICLE_NOO }

Metadata

The Metadata object provides version information of the lexicon data, some statistics about the data, and human-readable descriptions of its authorship, licensing, and attribution requirements.
Attribute Key Type Description
version String A version String that uniquely identifies this release.

It is formatted as YYYY-MM-DD-vN, where YYYY is the year, MM is the month (01-12), and DD is the day of the month (01-31), and N is the revision number for that day.
license String The license under which the data is released
attribution String The attribution text that consumers of the data should display in their application or derivative work when using the data
publisher String
sponsorship String
author String
contactEmail String A contact email address for support
website String URL to the Nayiri website
numLexemes Integer The number of Lexemes in the data set
numLemmas Integer The total number of Lemmas across all Lexemes in the data set
numWordForms Integer The total number of Word Forms across all Lemmas of all Lexemes in the data set
numInflections Integer The number of Inflection objects defined globally

Previous: File Format