\_sh v3.0  374  Readme Notes

\id ADAPT2A 
\s Norwegian sample without Parse and Generate
\s2 Overview
\p This sample shows a simple adaptation approach being used to
adapt Norwegian to English. Norwegian is fairly close to
English in sentence structure, but has some morphological
differences. For example, the equivalent of "the" is a suffix.
\p Adapting Norwegian to English is a good way to get used to the
concepts of adaptation. Since you are a speaker of the target
language, you can judge the quality of the adapted output. You
can easily see whether the adapted text makes sense. You can
see whether it reads smoothly. You can see the places where
source expressions and idioms still show in the target. You can
see the kinds of final adjustments that have to be made by hand
in any adaptation.
\p You can also see the effectiveness of adaptation. Without
adaptation, you would not be able to read the Norwegian text.
But with adaptation, you can easily understand the text. For
example, try to read the first verse or two without looking at the
English. Then read the English and see how clear the meaning
is. (It may be slightly "unnatural" English, but it is certainly clear
enough to catch the meaning.)
\p Norwegian adaptation is different from the Prinderella samples
in that almost all of the words change significantly. To see this,
look at the first verse or two and try to imagine regular sound
changes to convert words from Norwegian to English. Almost
all of the words change too much for regular sound changes,
and the few that are clear cognates require rules that look like
they probably would not apply very generally.
\p Thus it does not appear that a regular sound change approach
would work very well. So this adaptation setup makes a lexical
entry for every word.
\p If every word has a lexical entry, then by including a part of
speech field in the lexicon, we can give every word a part of
speech, and we can make use of the part of speech line for
generalized rearrangement rules.
\p Looking at the text you can see that the \ps field contains a part
of speech for each Norwegian word or phrase, and that the \e1
line contains an English equivalent for each word or phrase.
The  \e line is formed by Rearranging and adjusting the \e1 line
and adding the punctuation and capitalization. This step uses the
NORENG.RUL Rearrange rule file.
\p This sample does not Parse or Generate. It just substitutes
whole words and phrases from the lexicon and applies
Rearrange rules. (A later example shows this same adaptation
with parse and generate included.)
\s2 Interlinear Setup
\p Look at the interlinear setup of the text. You can see that it
consists of two Lookup processes and a Rearrange process.
\p Looking at the details of the first Lookup you can see that it
looks up the \lx field in the lexicon and outputs the \p field. If
the lookup fails, it outputs a failure mark, since the word has no
resemblance to the part of speech. The process is marked as
an adaptation process because there is no separate
interlinearization phase to a setup like this.
\p The second Lookup process is similar. It ouputs the word if the
lookup fails. If we were applying regular sound changes, we
would output the word with changes (using a CC table). This
would allow us to generate target words without lexical entries.
(But notice that any word that was generated in that way would
not have a part of speech for general rearrangement rules.)
\p The Rearrange process applies a rule file named
NORENG.RUL to the \e1 line to produce the \e line. Notice
that it is also given the \ps marker so that general rules can
refer to part of speech. It is given the \t marker so that it can
pick up capitalization and punctuation from the original text line.
\s2 Rearrangement Rule File
\p Close the interlinear setup and look at the file NORENG.RUL,
which is open in a window. The layout of the file is similar to
the Phonological rule file. It starts with a series of definitions
marked \def, followed by a series of rules marked \ru. 
\s3 Definitions
\p The definitions define symbols that can be used in the rules. In
this file the symbols are Syntactic Structures such as NP and
PP that can be used to make general rules that rearrange such
structures. When used in a rule, these symbols are NOT placed
in square brackets (like they are in the Phonological rule file).
\p Looking at the top of the rule file you can see that the first
definition defines the symbol "NP". The words "Noun Phrase"
just above the symbol are given in a comment field to clarify the
meaning of the symbol.
\p The second and following lines of the definition are each a
separate definition of the symbol. So "NP" is defined to be
"(Det) (Adj) N (PP)" or "PN" or "Pron". Parentheses
surrounding a symbol mean Optional, so the first definition says
that "Det", "Adj" and "PP" are optional in an NP.
\p The parentheses are the only reserved characters in definitions
or rules. For example, the slash in "PN/Pron" is simply a part of
the symbol itself.
\s3 Rearrangement Rules
\p Look down the file at the Rearrangement rules. Like the
Phonological rules, the first line of the rule is the pattern to
match, and the second line is the replacement. Included with
many of the rule fields are comment fields and example fields,
which give the example text and reference for the rule.
\p For example, the second rule changes "whole the" into "the
whole". The example is "the whole Sanhedrin (Mrk 14:55)".
Search in the text for verse 55 and look at the situation where
this rule applies. You can see in the \e1 line that the Norwegian
word for Sanhedrin includes a suffix that means "the". The
result in the \e1 line is that "the" appears after the preceding
adjective "whole". This rule inverts the order and fixes the
problem so that the \e line shows the correct order.
\s3 Tips
\p Look at other rules and their effects. The rules tend to be in
reverse order because new rules are put at the top of the list.
This makes sure that the new rule will not be overridden by
some existing rule.
\p Unlike Phonological rules, Rearrangement rules cannot feed
each other. Rules are attempted in order, and once a rule has
applied, no other rule can apply to the output of the first rule.
This can be used to block the application of a rule in a particular
context. If an earlier rule is applied, it prevents the application
of a later rule.
\s2 The Lexicon
\p The lexicon contains many phrases. A good example is the long
phrase "den ene etter den andre" translated "one by one" in
verse 19. Look through the text to see other examples. 
\nt Note: The easiest way to tell that a phrase in the lexicon has been
used in the adaptation process is that the entire phrase has only one
part of speech under it.

\s Summary
\p This adaptation was possible to create with almost no
ambiguous lexical entries. It mostly uses phrases in the lexicon
and rules to change the most frequent equivalent of a word to
another word. The two words that are ambiguous are "så",
which is "so", "as" or "then", and "da", which is "then" or
"when". Each of these occurs less than ten times in the whole
chapter, with about equal numbers of each equivalent. There
appears to be no good, general way to make rules to choose
between them, so they are left ambiguous and the user must
choose the appropriate equivalent by hand.
\p These are the only hand choices in the entire chapter, so you
can see that this adaptation moves along very rapidly. This is
the first chapter adapted, and by the time we reached the end
of it, we were adapting about 10 verses per hour, including
adding new words and phrases to the lexicon and making new
rules. This rate will continue to increase as more and more
words and phrases are already in the lexicon. Less time will be
spent making new rules as the most common rearrangements
are already handled and as we get used to the types of
situations we must fix by hand instead of by rule.
\p If this were not an exercise but we were actually adpating this
chapter, we would have immediately fixed by hand many of the
places that were obviously worded wrongly. Because this is an
example, no manual fixes were done. All the problem were left
in place so that you could see them. But it is recommend, in
actual adaptation work, that you immediately fix manually the
obvious problem places. As you do that, you may notice
patterns that can be handled by rules. You will also get familiar
with the kinds of constructions that need manual repair, so that
you will be able to fix them very quickly. Since you have to
proof the output anyway, it will save time if you fix things
immediately. It will save time in the revision process if you fix
the obvious problems before you show it to others.
\s2 Note
\p The creators of this sample do not know Norwegian, and only a
short grammar summary and a bilingual dictionary were
available, but we used them very little. We wanted to
demonstrate that it is possible to adapt starting with very little
knowledge of the source language. Mostly English equivalents
were chosen based on the meaning we knew had to be
translated in each verse. Cognates gave clues to help us decide
which source words carried what parts of the meaning. To do
this you must also have some knowledge of the target language
and have good access to a native speaker of the target. But
those things are necesssary for any successful adaptation.
\p
\s To go to the next Adaptation tutorial, open the project in
the ADAPT2B folder.