\_sh v3.0 374 Readme Notes \id ADAPT2A \s Norwegian sample without Parse and Generate \s2 Overview \p This sample shows a simple adaptation approach being used to adapt Norwegian to English. Norwegian is fairly close to English in sentence structure, but has some morphological differences. For example, the equivalent of "the" is a suffix. \p Adapting Norwegian to English is a good way to get used to the concepts of adaptation. Since you are a speaker of the target language, you can judge the quality of the adapted output. You can easily see whether the adapted text makes sense. You can see whether it reads smoothly. You can see the places where source expressions and idioms still show in the target. You can see the kinds of final adjustments that have to be made by hand in any adaptation. \p You can also see the effectiveness of adaptation. Without adaptation, you would not be able to read the Norwegian text. But with adaptation, you can easily understand the text. For example, try to read the first verse or two without looking at the English. Then read the English and see how clear the meaning is. (It may be slightly "unnatural" English, but it is certainly clear enough to catch the meaning.) \p Norwegian adaptation is different from the Prinderella samples in that almost all of the words change significantly. To see this, look at the first verse or two and try to imagine regular sound changes to convert words from Norwegian to English. Almost all of the words change too much for regular sound changes, and the few that are clear cognates require rules that look like they probably would not apply very generally. \p Thus it does not appear that a regular sound change approach would work very well. So this adaptation setup makes a lexical entry for every word. \p If every word has a lexical entry, then by including a part of speech field in the lexicon, we can give every word a part of speech, and we can make use of the part of speech line for generalized rearrangement rules. \p Looking at the text you can see that the \ps field contains a part of speech for each Norwegian word or phrase, and that the \e1 line contains an English equivalent for each word or phrase. The \e line is formed by Rearranging and adjusting the \e1 line and adding the punctuation and capitalization. This step uses the NORENG.RUL Rearrange rule file. \p This sample does not Parse or Generate. It just substitutes whole words and phrases from the lexicon and applies Rearrange rules. (A later example shows this same adaptation with parse and generate included.) \s2 Interlinear Setup \p Look at the interlinear setup of the text. You can see that it consists of two Lookup processes and a Rearrange process. \p Looking at the details of the first Lookup you can see that it looks up the \lx field in the lexicon and outputs the \p field. If the lookup fails, it outputs a failure mark, since the word has no resemblance to the part of speech. The process is marked as an adaptation process because there is no separate interlinearization phase to a setup like this. \p The second Lookup process is similar. It ouputs the word if the lookup fails. If we were applying regular sound changes, we would output the word with changes (using a CC table). This would allow us to generate target words without lexical entries. (But notice that any word that was generated in that way would not have a part of speech for general rearrangement rules.) \p The Rearrange process applies a rule file named NORENG.RUL to the \e1 line to produce the \e line. Notice that it is also given the \ps marker so that general rules can refer to part of speech. It is given the \t marker so that it can pick up capitalization and punctuation from the original text line. \s2 Rearrangement Rule File \p Close the interlinear setup and look at the file NORENG.RUL, which is open in a window. The layout of the file is similar to the Phonological rule file. It starts with a series of definitions marked \def, followed by a series of rules marked \ru. \s3 Definitions \p The definitions define symbols that can be used in the rules. In this file the symbols are Syntactic Structures such as NP and PP that can be used to make general rules that rearrange such structures. When used in a rule, these symbols are NOT placed in square brackets (like they are in the Phonological rule file). \p Looking at the top of the rule file you can see that the first definition defines the symbol "NP". The words "Noun Phrase" just above the symbol are given in a comment field to clarify the meaning of the symbol. \p The second and following lines of the definition are each a separate definition of the symbol. So "NP" is defined to be "(Det) (Adj) N (PP)" or "PN" or "Pron". Parentheses surrounding a symbol mean Optional, so the first definition says that "Det", "Adj" and "PP" are optional in an NP. \p The parentheses are the only reserved characters in definitions or rules. For example, the slash in "PN/Pron" is simply a part of the symbol itself. \s3 Rearrangement Rules \p Look down the file at the Rearrangement rules. Like the Phonological rules, the first line of the rule is the pattern to match, and the second line is the replacement. Included with many of the rule fields are comment fields and example fields, which give the example text and reference for the rule. \p For example, the second rule changes "whole the" into "the whole". The example is "the whole Sanhedrin (Mrk 14:55)". Search in the text for verse 55 and look at the situation where this rule applies. You can see in the \e1 line that the Norwegian word for Sanhedrin includes a suffix that means "the". The result in the \e1 line is that "the" appears after the preceding adjective "whole". This rule inverts the order and fixes the problem so that the \e line shows the correct order. \s3 Tips \p Look at other rules and their effects. The rules tend to be in reverse order because new rules are put at the top of the list. This makes sure that the new rule will not be overridden by some existing rule. \p Unlike Phonological rules, Rearrangement rules cannot feed each other. Rules are attempted in order, and once a rule has applied, no other rule can apply to the output of the first rule. This can be used to block the application of a rule in a particular context. If an earlier rule is applied, it prevents the application of a later rule. \s2 The Lexicon \p The lexicon contains many phrases. A good example is the long phrase "den ene etter den andre" translated "one by one" in verse 19. Look through the text to see other examples. \nt Note: The easiest way to tell that a phrase in the lexicon has been used in the adaptation process is that the entire phrase has only one part of speech under it. \s Summary \p This adaptation was possible to create with almost no ambiguous lexical entries. It mostly uses phrases in the lexicon and rules to change the most frequent equivalent of a word to another word. The two words that are ambiguous are "så", which is "so", "as" or "then", and "da", which is "then" or "when". Each of these occurs less than ten times in the whole chapter, with about equal numbers of each equivalent. There appears to be no good, general way to make rules to choose between them, so they are left ambiguous and the user must choose the appropriate equivalent by hand. \p These are the only hand choices in the entire chapter, so you can see that this adaptation moves along very rapidly. This is the first chapter adapted, and by the time we reached the end of it, we were adapting about 10 verses per hour, including adding new words and phrases to the lexicon and making new rules. This rate will continue to increase as more and more words and phrases are already in the lexicon. Less time will be spent making new rules as the most common rearrangements are already handled and as we get used to the types of situations we must fix by hand instead of by rule. \p If this were not an exercise but we were actually adpating this chapter, we would have immediately fixed by hand many of the places that were obviously worded wrongly. Because this is an example, no manual fixes were done. All the problem were left in place so that you could see them. But it is recommend, in actual adaptation work, that you immediately fix manually the obvious problem places. As you do that, you may notice patterns that can be handled by rules. You will also get familiar with the kinds of constructions that need manual repair, so that you will be able to fix them very quickly. Since you have to proof the output anyway, it will save time if you fix things immediately. It will save time in the revision process if you fix the obvious problems before you show it to others. \s2 Note \p The creators of this sample do not know Norwegian, and only a short grammar summary and a bilingual dictionary were available, but we used them very little. We wanted to demonstrate that it is possible to adapt starting with very little knowledge of the source language. Mostly English equivalents were chosen based on the meaning we knew had to be translated in each verse. Cognates gave clues to help us decide which source words carried what parts of the meaning. To do this you must also have some knowledge of the target language and have good access to a native speaker of the target. But those things are necesssary for any successful adaptation. \p \s To go to the next Adaptation tutorial, open the project in the ADAPT2B folder.