\_sh v3.0 385 Readme Notes \id ADAPT1B \s Prinderella with Parse and Generate processes \p This sample project expands on the previous one by adding word parsing and generation. The first Prinderella sample did not parse words. It had separate lexical entries for the singular and plural forms of words, and for past and present forms. This sample breaks off morphemes like "Plural" and "Past" so that each root only needs to be entered once in the lexicon. This sample also handles the difference between "a" and "an" in a general way. \p As in the first sample, words and phrases are looked up in a simple cross-lexicon that contains only words that change. Words that do not change are allowed to pass through adaptation unchanged. Words and phrases that are found in the lexicon are replaced with the target equivalent from the lexicon (in the \e field, in this example). \p This adaptation sample does not parse every word, only those whose roots must be found in the lexicon in order to be changed. It is not necessary to parse words that do not change, or whose changes are handled by regular sound changes via a CC table. \p Read down the file of adapted text and notice how the words are parsed, e.g. "uglers" is parsed to "ugler -s". (The reason for the "a" to "a-" change is explained below.) This example ignores the ambiguity of the -s suffix (plural and third singular) since the same ambiguity exists in the target language. \p \s Interlinear Processes \p Click on the Prinder.txt window and then look at the interlinear setup (under Database, Properties). You will see that it contains three processes. The first is Parse from \t to \m. This divides the source words into morphemes, and handles morphophonemic changes. The second is Lookup from the Prinder morpheme breaks \m to English \em. This changes morphemes from source to target. The third process is Generate from \em to \e. This puts target morphemes back together into words, and handles morphophonemic changes. \s2 Parse Process \p Looking more closely at the Parse process (double-click it), by clicking the Lexicons button you will see that it looks up the fields \lx and \a and that it outputs the field \u. That is similar to the typical parsing setup for interlinearizing, except that if parse fails, the word is output without a failure mark. In normal interlinearzing, every word must parse or be found in the lexicon because all of its morphemes must be given a gloss and part of speech. But in a simple adaptation setup like this, between closely related languages with many shared words, it is not necessary. Words that do not change do not need to be entered in the lexicon and so do not need to parse. \p In contrast to the previous project, the Parse and Lookup processes do not need to keep capitalization and punctuation because the generate process can pick those up from the top line and put them back on as it produces the final output. \s2 Lookup Process \p Looking at the Lookup process (lines \m to \em), you will see that it looks up the \lx field in the lexicon and outputs the \e field. This converts all morphemes from source to target. Words that are not found are output unchanged. (It could apply a CC table for regular sound changes.) Notice that the lookup process is marked as an adaptation process. \s2 Generate Process \p Looking at the Generate process (lines \em to \e), you will see that it gets punctuation and capitalization from the \t line. You will also see that it applies rules from a rule file named ENGPHON.RUL. This file contains phonological rules that perform any of the morphophonemic changes needed for English. (Most of these are actually orthographic, but they illustrate how morphophonemic rules work.) \p Close the Database Properties dialog box and look at the window containing the file ENGPHON.RUL. \s Phonological Rule File \p The ENGPHON.RUL file gives the phonological rules to be applied during the Generate process. The field markers used in such a file are defined by Shoebox and can't be changed, but the fonts and field names can be tweaked by the user. You are encouraged to use the comment (\co) and example (\ex) fields extensively to make the file more readable. \s2 Definitions of Symbols \p At the top of the file you will see various definitions of phonological symbols. For example, vowels in the phonological rules are marked by the symbol V. The comment line (\co Vowel) simply clarifies the purpose of the first definition. The definition line (\def) gives the actually symbol (V) to be used, and the following line gives a list of letters that the symbol will represent. \p In the first definition, a vowel (V) is defined as "a", "e", "i", "o" or "u". Individual letters must be separated by a space. Multigraph characters are entered as combinations of two or more letters. \p A symbol is placed in square brackets when used in a rule. For example, the second phonological rule uses [V]. \nt Note: All definitions must come before the first rule. \s2 Phonological Rules \p Phonological rules come next and are marked by the \ru field. The first line of a rule field is the pattern to be matched, and the second line of the rule is the replacement or results of the rule. For example, the first rule says that "y" followed by "-e" becomes "ie". The hyphen in the rule match is required to match the morpheme boundary hyphen. The hyphen is usually left out of the replacement so that the output does not look like a morpheme boundary to later rules. The space in the match is significant. A hyphen with no space on either side matches any morpheme boundary, but a hyphen with a space before it matches only at a suffix boundary, and a hyphen with a space after it matches only at a prefix boundary. \p Each rule is preceded by a comment field that describes what the rule does. It is also helpful if an example field (\ex) is included to give specific examples from the language showing the need or purpose of the rule. \s3 Rule Ordering \p The order in which the rules are listed in the rule file is critical. Shoebox uses the rules in the order they occur, which allows for bleeding and feeding of rules, etc. For example Rule 4 bleeds Rule 5 by changing "a-" to "an" (next to vowels) before it ever get to Rule 5. \p Notice that the rules in this file are numbered in the comment field; this is only for readability and not noticed by Shoebox. It's probably best to leave off these numbers until all the rules have been created and ordered, since the numbers would need changing as new rules are added. \s3 Other Explanations \p In this example, Rule 3, which doubles consonants, shows that a defined symbol ([Cs]) which occurs only once in the match can be used more than once in the replacement. If it is, the same letter is output twice. \p Rules 4 and 5 (as mentioned above) implement the difference between "a" and "an". The input to these rules is prepared by having both "a" and "an" generate "a-" from the lexicon Lookup. The hyphen on "a-" causes the Generate process to treat it as a prefix attached to the following word. This allows rules to apply across a word boundary. Note that an entry like "a" becoming "a-" is best put in a special lexicon reserved for oddities like this. \p Rule 4 says that word-initial "a-" becomes "an#" when the following word starts with a vowel. The hash mark (#) on the front of "a-" (#a-) stands for a word boundary, so the rule matches only an "a-" at the beginning of a word. The word boundary after "an" in the replacement line causes it to be output as a separate word instead of a prefix. \p Rule 5 says that all instances of "a-" become "a#". But if Rule 4 has applied, then the word no longer contains "a-", so Rule 5 cannot apply in those cases. (This is why the order is critical.) Also notice, as mentioned in the comment field, that the replacement line puts in two spaces (##) after the "a" so that the following word will be aligned with the word above. This is not necessary, but it makes the output a little more readable. \p To see where these rules have worked, look at the phrase "twinkling of an eye". This uses two rules. One is that the "e" of "twinkle" goes away before "-ing" (Rule 2). The other is that "a" becomes "an" before "eye". \p \s Exercise \p There is an exercise at the bottom of the PRINDER.TXT file. \nt Try to do it first, and then read the answer below. \p \s Answer to Exercise \p Adapt the sentence. \p Select the root "bow" from "bows" and insert it into the lexicon, entering "cow" in the \e line, and adding another \e line with "bow" to make it ambiguous. \p Adapt the word "bows" again to see the result. It should parse as "bow -s". \p Use the same process to adapt "cees" to "bees", "hilk" to "milk", and "money" to "honey". \p \s To go to the next Adaptation tutorial, open the project in the ADAPT2A folder.