\_sh v3.0 560 Readme Notes \id ADAPT \s Adaptation \p Adaptation ("Adapt" for short) refers to the conversion or adaptation of text from one language to another. It only works between related languages because it does only limited rearrangements of words and morphemes. Adaptation does a very literal translation from source to target. This is good if the source and target are closely related so that word order and the idiomatic expressions of the source text generally work well in the target language as well. \p Adaptation has usually been done with a set of DOS programs called CARLA. Shoebox can be used to manage the lexical data while running these programs. Shoebox also has features of its own which can be used for adaptation. Most of these samples illustrate the use of the Shoebox features. One shows how to use Shoebox with the DOS programs. (The DOS programs are documented elsewhere.) \p \p \s The ADAPT Tutorial Samples \s2 Overview \p The ADAPT folder contains a series of adaptation sample folders named ADAPT1A, ADAPT1B, ADAPT2A, etc. The samples are numbered in the recommended order for studying them, as they generally build on each other, with more advanced techniques shown in the higher numbered samples. \p The sample projects are tutorials that will teach you how to do adaptation with Shoebox. Each sample project opens showing a README.TXT file that explains how the sample works and points out things that are of special interest. Use project, Open to get from one example to the next. \s3 ADAPT1A -- Prinderella and the Cince \p A fanciful example that demonstrates converting text in a very closely related language to another. This example focuses on using the lexicon as the main conversion tool and addresses dealing with ambiguity. \s3 ADAPT1B -- Prinderella and the Cince \p The same story as ADAPT1A, but this example adds a parsing process to the adaptation process. Doing this lightens the load the lexicon must bear by eliminating derivable forms. This example also introduces the use of Phonological Rules to deal with peculiarities of English spelling, which is similar to morphological problems. \s3 ADAPT2A -- Norwegian - English adaptation \p A more realistic adaptation process, this example takes a Norwegian translation of Mark 14 and converts it into English. It demonstrates the use of Rearrange Rules to account for differences in word order between the two related languages. \s3 ADAPT2B -- Norwegian - English adaptation \p This is a full-blown adaptation process that takes the same Norwegian text of ADAPT2A and uses Parsing, Rearrange Rules and Phonological Rules to make the adaptation more predictive and accurate, thereby reducing the number of phrase-level lexical entries needed. \s3 ADAPT3A -- Yawelmani Generative Morphology \p This project demonstrates the power of the Generate process and the Phonological rules files that Shoebox uses by taking Yawelmani underlying forms and in one step processes them through a complex set of rules to produce the attested surfaces forms. This capability of the Generate process is useful both in working out the phonological analysis of a language and for adapting one language to another. \s3 ADAPT3B -- Yawelmani Generative Morphology \p This sample takes the same Yawelmani data and rules but splits the rules into seven different Generate process steps to demonstrate the stages that an underlying form must go through to reach the surface form. \s3 ADAPT3C -- Yawelmani Generative Morphology \p Again the same Yawelmani data and rules, but this one constitutes a compromise between the one-step process of ADAPT3A and the seven-step process of ADAPT3C. This setup uses three Generate processes and rule files to derive the surface forms. This layout is more manageable as a working environment. \s3 ADAPT4A -- Shoebox support for DOS CARLA programs \p This sample is mostly for those who have been using CARLA and Shoebox for DOS but are wanting to move into this version of Shoebox for their dictionaries. \nt Note: ADAPT4A requires that you have the CARLA and CC programs on your DOS path. \s2 Details \p The Adaptation samples assume you have a basic knowledge of how to do the basic operations in Shoebox, including interlinear. If you have not used Shoebox yet, you should go through at least the Basic Features and Interlinear chapters of the Walkthrough before you go through these samples. \p To open a sample, choose Project, Open. Navigate to the sample folder, and open whatever project file you see there. The name of the project file varies from sample to sample, but each sample folder contains only one project file. \p Each sample is a fully working adaptation setup that you can explore freely and use as a model for your own setup. The word "observe" is used in comments to highlight areas of special interest in the actual language sample . If you use Edit, Find to look for the word "observe" in comment (\co) fields, you will see discussions of all the places that are of special interest. \p Some of the sample projects contain exercises that you can do to reinforce your understanding of the techniques illustrated in the sample. The README file picked up by each project gives explanations of the exercises. \p \s2 Tip \p Each sample project is sized to fit on a standard VGA screen. If you have a larger screen, the project will show in the upper left corner of the screen. The README file is placed in the lower right corner so that if you have a larger screen, you can drag down the lower right corner of the project window to enlarge it, and then drag down the lower right corner of the README file so you can see more of the text at once. If you want to see even more at once, you can print the README file by choosing File, Print. \p \p \s Details on the Interlinear and Adaptation Processes \s2 The difference between the "Interlinear" (Alt+I) and "Adaptation" (Alt+A) shortcut keys \p It can get confusing that the Interlinear tab contains processes for both interlinearization and for adaptation. Basically, the Parse process is always an Interlinear process, and the Rearrange or Generate processes are always Adaptation processes. A Lookup process is usually an Interlinear process but can be marked as an Adapt process (useful if you are not Parsing and will only be adapting the text). If the Interlinear setup has both Interlinear and Adaptation processes, then here is how the shortcut keys work: \p * If the text line (e.g. \t) has not been interlinearized yet: \p * Alt+I (command-I on the Mac) will do only the Interlinear processes and will \p process the entire line. \p * Alt+A (command-A on the Mac) will do all of the processes (both the Interlinear \p and Adaptation processes --it does the Interlinear steps in order to get the \p information it needs to do the adaptation) on the entire line. \p * If the text line has already been Interlinearized and Adapted: \p * Alt+I will rerun the Interlinear processes on a word-by-word basis. This deletes \p any information in the Adaptation lines underneath the word being reinterlinearized. \p * Alt+A reruns all of the Adaptation processes for the entire line; it assumes the \p interlinear information is correct. \p \s2 The Interlinearize Toolbar Button \p The Interlinearize Button on the Toolbar always acts just like an Alt+A. If only Interlinear processes have been defined --no Adaptation processes-- the Interlinearize toolbar button runs the Interlinear processes as expected. \p \s2 Rearrange Process \p The Rearrange process uses a "Rearrangement Rule File". This provides Shoebox with needed syntactic information that enables it to reorder words and phrases in the adaptation process. This file can contain any number of rearrangement rules, but only one rule at a time can apply for any given Rearrange process; once a rule is matched, the output is produced and the adaptation moves on to the next process. Defined symbols in a rearrangement file are not put in square brackets in the rules and can be mixed freely with text, e.g. "N -the" can be rearranged to "the N". You are encouraged use the REARRANG.TYP file to build your own Rearrangement rule files. \p \s2 Generate Process \p The Generate process uses a "Phonological Rule File". This file looks very similar to the Rearrangement rule file, but it isn't the same. This file can contain any number of phonological rules that are all processed in order before the output is produced. Also any defined symbols in this file must be included in the rules in square brackets, e.g. [C]. You are encouraged to use the PHONRULE.TYP file to build your own Phonological rule files. \p \p \s To start the First Adaptation Tutorial \p * Choose Project-Open \p * Go into the folder ADAPT1A \p * Open the PRINDER.PRJ \p \nt Have Fun! \p \p \s Making Your Own Adaptation Setup \s2 How to copy and convert a sample project as a start. \p After you have studied these samples, choose the one that is most like what you want to do and use it as a model set up your own project. One way to do this is to copy the contents of the sample folder to your own project folder and modify the sample to use your own file names and your own setup. Load your own lexicon and link it into the Interlinear setup processes (rather than the one used by the original setup). You might want to keep the original tutorial around to check as an example if you are unsure of marking Alternate and Underlying forms. \p You can also deactivate any rule or definition in a rule file (if you are using a setup with these) by changing the \ru and \def fields to \dis (for "disable"). Doing this will gray out the rule or definition, showing it is disabled. This allows you to keep the rules and definitions in the rule file for reference, but Shoebox will ignore them when processing your data. Delete them when you know you don't need them anymore. \p \s2 How to import text for adaptation \p Once you've copied the tutorial you want and are ready to try adapting your own text, you'll first need to import your text into Shoebox. To import text, use the File-Open command and select the text file you want to import. \p If the file has never been in Shoebox (or has been in an older version of Shoebox), you will likely see the Import dialog box. Here you need to specify the database type for this file (if you are using an adaptation sample as the basis for your own, this is the "Interlinear" type). \p If you are importing plain, raw text, you will want to specify a Consistent Changes Table. The table TEXTPREP.CCT is provided in the "SHOEBOX\STD_SET" folder for this purpose. If you need to convert the file from ASCII to ANSI, you should do that first using IBM_ANSI.CCT or your own customized table. \p If you are importing standard format text that is organized with markers \id, \c, and \v, use the table SCRPREP.CCT instead of TEXTPREP.CCT. \p \s2 How to export adapted text \p Once the file has been adapted to the target language, you might want to get rid of all of the processing lines and just keep the resulting target text line (\e in the adaptation samples). The easiest way to get this is to export the file. \s3 Quick File-Export \p Choose File-Export, select Standard Format and click OK. In the SF Export Properties dialog box, click off the All Fields check box. Now select only the field(s) you want exported by clicking on the Select Fields button and move the fields you want to the Include side of the Select Fields dialog box. Click OK. If you need a Consistent Changes Table, you can specify one by using the Browse button. Once you are done setting up the export, click OK in the Properties dialog box. Then give Shoebox a name for the export file, and Shoebox will export the file for you. \s3 Create your own File-Export Option \p If you think you will be exporting files such as this often, you can create a special export option just for that purpose. To do this, choose File-Export, but rather than clicking OK, click Add. Select Standard Format from the choices and click OK. Give the new export process a unique name, e.g. Adapted Text. Then set up this export option just like the one above. When you click OK here, Shoebox will save this new export option so that it will be available next time you choose File-Export.