\_sh v3.0  385  Readme Notes

\id ADAPT1B
\s Prinderella with Parse and Generate processes
\p This sample project expands on the previous one by adding
word parsing and generation. The first Prinderella sample did
not parse words. It had separate lexical entries for the singular
and plural forms of words, and for past and present forms. This
sample breaks off morphemes like "Plural" and "Past" so that
each root only needs to be entered once in the lexicon. This
sample also handles the difference between "a" and "an" in a
general way.
\p As in the first sample, words and phrases are looked up in a
simple cross-lexicon that contains only words that change.
Words that do not change are allowed to pass through
adaptation unchanged. Words and phrases that are found in the
lexicon are replaced with the target equivalent from the lexicon
(in the \e field, in this example).
\p This adaptation sample does not parse every word, only those
whose roots must be found in the lexicon in order to be
changed. It is not necessary to parse words that do not change,
or whose changes are handled by regular sound changes via a
CC table.
\p Read down the file of adapted text and notice how the words
are parsed, e.g. "uglers" is parsed to "ugler -s". (The reason for
the "a" to "a-" change is explained below.) This example ignores
the ambiguity of the -s suffix (plural and third singular) since the
same ambiguity exists in the target language.
\p
\s Interlinear Processes
\p Click on the Prinder.txt window and then look at the interlinear
setup (under Database, Properties). You will see that it contains
three processes. The first is Parse from \t to \m. This divides the
source words into morphemes, and handles morphophonemic
changes. The second is Lookup from the Prinder morpheme
breaks \m to English \em. This changes morphemes from source
to target. The third process is Generate from \em to \e. This
puts target morphemes back together into words, and handles
morphophonemic changes.
\s2 Parse Process
\p Looking more closely at the Parse process (double-click it), by
clicking  the Lexicons button  you will see that it looks up the 
fields \lx and \a and that it outputs the field \u. That is similar to
the typical parsing setup for interlinearizing, except that if parse
fails, the word is output without a failure mark. In normal
interlinearzing, every word must parse or be found in the lexicon
because all of its morphemes must be given a gloss and part of
speech. But in a simple adaptation setup like this, between
closely related languages with many shared words, it is not
necessary. Words that do not change do not need to be entered
in the lexicon and so do not need to parse.
\p In contrast to the previous project, the Parse and Lookup
processes do not need to keep capitalization and punctuation
because the generate process can pick those up from the top
line and put them back on as it produces the final output.
\s2 Lookup Process
\p Looking at the Lookup process (lines \m to \em), you will see
that it looks up   the \lx field in the lexicon and outputs the \e
field. This converts all morphemes from source to target. Words
that are not found are output unchanged. (It could apply a CC
table for regular sound changes.) Notice that the lookup
process is marked as an adaptation process.
\s2 Generate Process
\p Looking at the Generate process (lines \em to \e), you will see
that it gets punctuation and capitalization from the \t line. You
will also see that it applies rules from a rule file named
ENGPHON.RUL. This file contains phonological rules that
perform any of the morphophonemic changes needed for
English. (Most of these are actually orthographic, but they
illustrate how morphophonemic rules work.)
\p Close the Database Properties dialog box and look at the
window containing the file ENGPHON.RUL.

\s Phonological Rule File
\p The ENGPHON.RUL file gives the phonological rules to be
applied during the Generate process. The field markers used in
such a file are defined by Shoebox and can't be changed, but
the fonts and field names can be tweaked by the user. You are
encouraged to use the comment (\co) and example (\ex) fields
extensively to make the file more readable.
\s2 Definitions of Symbols
\p At the top of the file you will see various definitions of
phonological symbols. For example, vowels in the phonological
rules are marked by the symbol V. The comment line (\co
Vowel) simply clarifies the purpose of the first definition. The
definition line (\def) gives the actually symbol (V) to be used,
and the following line gives a list of letters that the symbol will
represent. 
\p In the first definition, a vowel (V) is defined as "a", "e", "i", "o"
or "u". Individual letters must be separated by a space.
Multigraph characters are entered as combinations of two or
more letters.
\p A symbol is placed in square brackets when used in a rule. For
example, the second phonological rule uses [V].
\nt Note: All definitions must come before the first rule.

\s2 Phonological Rules
\p Phonological rules come next and are marked by the \ru field.
The first line of a rule field is the pattern to be matched, and the
second line of the rule is the replacement or results of the rule.
For example, the first rule says that "y" followed by "-e"
becomes "ie". The hyphen in the rule match is required to match
the morpheme boundary hyphen. The hyphen is usually left out
of the replacement so that the output does not look like a
morpheme boundary to later rules. The space in the match is
significant. A hyphen with no space on either side matches any
morpheme boundary, but a hyphen with a space before it
matches only at a suffix boundary, and a hyphen with a space
after it matches only at a prefix boundary.
\p Each rule is preceded by a comment field that describes what
the rule does. It is also helpful if an example field (\ex) is
included to give specific examples from the language showing
the need or purpose of the rule. 
\s3 Rule Ordering
\p The order in which the rules are listed in the rule file is critical.
Shoebox uses the rules in the order they occur, which allows for
bleeding and feeding of rules, etc. For example Rule 4 bleeds
Rule 5 by changing "a-" to "an" (next to vowels) before it ever
get to Rule 5.
\p Notice that the rules in this file are numbered in the comment
field; this is only for readability and not noticed by Shoebox. It's
probably best to leave off these numbers until all the rules have
been created and ordered, since the numbers would need
changing as new rules are added.
\s3 Other Explanations
\p In this example, Rule 3, which doubles consonants, shows that
a defined symbol ([Cs]) which occurs only once in the match
can be used more than once in the replacement. If it is, the same
letter is output twice.
\p Rules 4 and 5 (as mentioned above)  implement the difference
between "a" and "an". The input to these rules is prepared by
having both "a" and "an" generate "a-" from the lexicon Lookup.
The hyphen on "a-" causes the Generate process to treat it as a
prefix attached to the following word. This allows rules to apply
across a word boundary. Note that an entry like "a" becoming
"a-" is best put in a special lexicon reserved for oddities like this.
\p Rule 4 says that word-initial "a-" becomes "an#" when the
following word starts with a vowel. The hash mark (#) on the
front of "a-" (#a-) stands for a word boundary, so the rule
matches only an "a-" at the beginning of a word. The word
boundary after "an" in the replacement line causes it to be output
as a separate word instead of a prefix.
\p Rule 5 says that all instances of "a-" become "a#". But if Rule 4
has applied, then the word no longer contains "a-", so Rule 5
cannot apply in those cases. (This is why the order is critical.)
Also notice, as mentioned in the comment field, that the
replacement line puts in two spaces (##) after the "a" so that the
following word will be aligned with the word above. This is not
necessary, but it makes the output a little more readable.
\p To see where these rules have worked, look at the phrase
"twinkling of an eye". This uses two rules. One is that the "e" of
"twinkle" goes away before "-ing" (Rule 2). The other is that "a"
becomes "an" before "eye".
\p
\s Exercise
\p There is an exercise at the bottom of the PRINDER.TXT file.
\nt Try to do it first, and then read the answer below.
\p
\s Answer to Exercise
\p Adapt the sentence. 
\p Select the root "bow" from "bows" and insert it into the lexicon,
entering "cow" in the \e line, and adding another \e line with
"bow" to make it ambiguous. 
\p Adapt the word "bows" again to see the result. It should parse
as "bow -s". 
\p Use the same process to adapt "cees" to "bees", "hilk" to "milk",
and "money" to "honey".
\p
\s To go to the next Adaptation tutorial, open the project in
the ADAPT2A folder.