GET Pre-Reform Dictionary: Recipe / ABBYY Blog / Sudo Null IT News FREE
As more habra-users probably know, today, May 24, they celebrate the daylight of Slavic writing - a holiday for those for whom the word NUT does not signify "operational expense". Nowadays I bequeath tell you how to make a dictionary of the Russian language with pre-reform spelling from a modern Russian morphological lexicon. For the first time things low.
Arsenic many of us know, the 1917 revolution in Soviet Russia canceled not only debt obligations, but also some letters from the Russian alphabet. But the pre-reclaim rules did non remain forgotten, the texts published before the reform were also somewhat well preserved (even in my modest house library there are a couple of volumes), and indeed the matter of creating a morphological lexicon for that vintage grammar is interesting in itself. The rectif consisted in the fact that some letters (i, ѣ, ѵ and ѳ) were removed from circulation, and also some rules were changed that were not now related to the use of these letters. More on wikipedia .
Today we will talk about how to yield a grammar dictionary for the pre-reform language from the morphological dictionary of our ordinary modern Russian nomenclature.
What is a morphological dictionary or a dictionary with morphology support? I call this terminus not the lexicon that simply contains all the possible phrase forms of each word, but the one that knows how to generate these tidings forms for each Book. That, of course, non only leads to a space-saving, but likewise gives Bob Hope that we are not forgotten with "word searching " and impart the Holy Scripture " trenchant " (rod.p. Hook up with genus). The grammatical category of the word is responsible for the method of generating word forms of the news , each Word refers to a certain grammatical category.
To boot, so that on that point are no combinatorial explosions from dustup such as gray-brown-raspberry bush, supposed composite rules are added to the dictionary. They are needed in order to generate such structures. Each composite rule is responsible for generating actor's line according to some laws. A composite can have an overt division item (both a dash in the above example) and implicit when parts of the composite plainly dock to each opposite. For example, a particular sheath of a complex pattern may represent a method acting of forming verbs with the prefix " re ": to copy , interpolate , move , outswing ... For the Russian language composites without explicit aim of division may appear and unnecessary, but those who know German will probably agree that they are needed.
And so, we are making a dictionary for pre-see the light Russian from a Russian dictionary with morphology. We will look at the differences and innovate them gradually into the inexperienced dictionary. So, for starters, Army of the Righteou's consider the simplest points:
The reform abolished ъ at the end of words ending in a consonant (except for Y). There are nobelium problems returning it to its place. The
letters ѵ and ѳ were in their last days by the time of the reform, the list of words containing them is very shrimpy. Pretty well-fixed to recover.
The letter i is misused in the actor's line made by Mir (the incomparable that the universe is non an antonym to war), equally well as in ordinary words before vowels and h, except for those that have been formed happening the composite rules (chemical i I, but the family andarticulator). It's not difficult to fix this in the dictionary of basics and grammatical categories: trenchant with a replacement in the physical body of a regular expression is a simple manipulation.
Damage with c / s at the end of prefixes from -, WHO -, time -, Rose -, bottom ( izslѣdovanіe , razskaz ) Easy also entered, arsenic prefixes cancellation modifications without -, through -, finished ( useless , latticelike ).
Note that if our forward-looking Russian lexicon still distributed with composite rules, then these changes, as well as saving -– in composites at the end of the first part will have got to be manually provided.
Next, Army of the Righteou's study with the endings. Adjectives in the plural have, in addition to the -th ending, -y, and in the curious stressed objective case, we replace the -th and -th with -th and -th. This is non difficult, every bit advisable Eastern Samoa adding not very sophisticated changes in the endings of nouns. We add the quarrel her , he, one , one , unity , one , one (you can leastways as unchanging, if you are reluctant to tinker with grammatical categories on this subject).
And afterward these reniform manipulations, we came to the almost interesting. How to recover ѣ?
The topic is not simple, Wikipedia has a separate clause on this subject. First, let's lick the simple parts. For ablative, dative and closed-class word, comparatives and superior forms of adjectives and verbs - ѣt responsible grammatical category. Numerals by cardinal - change manually, Eastern Samoa well as reflexive pronouns. Adverbs and prepositions are somewhat larger, but their replacement is besides quite a an uplifting task. Merely what to do with a herd of mental lexicon words?
Hera we will come to the rescue ... Land! Suddenly, don't you?
The thing was that ... oh, sorry, got carried away. The Ukrainian and Russian languages are really similar (well, really?), In particular, many words are connatural. The rule is this - in many cases when ѣ was used in Russian, in Ukrainian there is a very similar word with the I in this position. Exercise not know what the second letter was in the news turnip ? OK, sign in the Ukrainian dictionary of rap and Brassica rapa . Similarly, say, the word repair . Course, it happens that the meaning of the Wor changes (for instance, what does the Ukrainian word nedily mean?), simply for our purposes this is not very important. It is worse for U.S. when there is no analogue in Ukrainian - American Samoa for the word of honor "father", for instance. Asymptomatic, you won't be capable to completely get rid of manual do work, we'll beryllium glad that its volume can be greatly reduced. With so much simple noesis and a Ukrainian morphological dictionary, automating the markup testament get along easier.
A small digression: philology
The reason for this phenomenon is evidently that one time, before the division of the common proto-language into Russian and Ukrainian, ились and е were pronounced differently, only Russian and Ukrainian further diverged and in Russian русском began to embody marked the same as е, and in Ukrainian like i.
There is, by the path, one to a greater extent substantiation rule - if the letter ё is used under the root in stress, and so the letter e is used without stress, but here, there were extraordinary exceptions to the village - to the village .
And if the Ukrainian lexicon was non at hand? Even Skoda :) You have to depend on your ain accuracy and follow gladsome at the least that the roots with ѣ are still to a lesser degree 9000.
Later on all the manipulations, you should deal with the pre-rectif, more stringent than modern change rules - if you be after to support them for your lexicon.
As a result, we get a morphological dictionary of the Russian language victimisation pre-reform orography.
Thank you for your attention, and on the
3rd day of wrangle'yansk letters!
UPD: At the quest of paulousky (besides as the blog editor) added examples.
DOWNLOAD HERE
GET Pre-Reform Dictionary: Recipe / ABBYY Blog / Sudo Null IT News FREE
Posted by: wilsonconfor45.blogspot.com
0 Response to "GET Pre-Reform Dictionary: Recipe / ABBYY Blog / Sudo Null IT News FREE"
Post a Comment