Basic
searching
Basic searches are
accomplished by specifying variables in each of the three columns of
the search template. Up to five rows may be specified using the logical
expressions and and or. Thus one can search for words
that begin with cho: and end with ka through the use of
two rows joined by and. There are several things to remember
in any search:
- the field column
lists in text the fields that are searched with any submission. Note
that often more than one field is searched. For example, when searching
for Ameyaltepec word several fields are searched:
Word
searching in Nahuatl
Word
searching in English
At this time, given
the immense amount of time it would involve, there are no simple glosses
or single-word definitions for Nahuatl words. A search for the Nahuatl
equivalent of any English word must be conducted in an English sense
field (/sea, /seo, /seao) with the logical operator contains
word. What this does is look in the various sense fields for a character
string as word (that is, preceded by a space and followed by a space
or punctuation). A user could, therefore, search:
- English sensecontains
wordcry
This will return
5 hits, including
yo:ltepistik
: 1 : to be tough of character; to be hard-hearted; to be tenancious;
to be able to endure adversity (e.g., a person who does not cry or
break down when scolded or beaten, or who shows little tendency to
back down when their compasion is appealed to)
Clearly yo:ltepistik
is not what most users would expect; it was listed simply because cry
is contained in the definition: person who does not cry or break
down when scolded or beaten.
There are, however,
reasons for not writing a keyword search or simple English word finder
function. The first is simply that of resources. Given that many entries
have to be redefined, elaborated, and otherwise checked, the implications
of creating a word finder list at this time, with a dictionary in process,
are that other tasks, which are probably more urgent, would have to
be neglected. A second reason is that many Nahuatl words are incapable
of being summarized in English to a degree that would permit searches.
Finally, there would be a great chance of leaving basic English words
out.
The benefits of
the present system is that in searching for any word, users are presented
with a more complete semantic domain. One only needs to search for English
sensecontains wordhappy to see these advantages. Moreover,
clever use of the multiple search functions should enable users to limit
searches, with a little ingenuity. For example, if one searches for
the word order, hundreds of hits are given, since any definition
with the phrase in order to or in order that would be
pulled up. Thus one could simply search for English sensecontains
wordorder and English sensedoes not contain sequencein
order. This yields 8 hits; by further specifying Part of speechcontainsN
only two results appear.
Other ways of limiting
searches involve placing limits on the size of the sense definition.
For example, to find the word for and one can search English
sensecontains wordand and English senseregular
expression^[a-z]{1,15}$, i.e., that the total length of the
sense field is between 1 and 15 letter characters.
Regular
expression searches
The NLE : Lexicon
search engine is based on the submission of regular expression queries
to the MySQL database. (A regular expression is a series of symbols
used to represent or describe a given string of text.) The regular
expression submitted for any query is displayed at the bottom of the
search results page. Thus if one submits Ameyaltepec wordbegins
withcho:ka, the regexp submitted (and displayed at the foot
of the results page) is as follows:
(lxa_REGEXP_'^(%?cho:ka)'_OR_lxa_REGEXP_'%cho:ka[a-zA-Z]*%?'
_OR_lxaa_REGEXP_'^(%?cho:ka)'_OR_lxa_REGEXP_'%cho:ka[a-zA-Z]*%?'
_OR_lxap_REGEXP_'^(%?cho:ka)'_OR_lxa_REGEXP_'%cho:ka[a-zA-Z]*%?'
)__ORDER_BY_alpha
The begins
with part of the query is represented by the ^ symbol, which signifies
start of line. If the query is changed to Ameyaltepec wordends
withcho:ka, the regexp submitted (as displayed) is as follows:
(lxa_REGEXP_'(cho:ka)$'_OR_lxa_REGEXP_'%[a-zA-Z]*(cho:ka%?)$'
_OR_lxaa_REGEXP_'(cho:ka)$'_OR_lxa_REGEXP_'%[a-zA-Z]*(cho:ka%?)$'
_OR_lxap_REGEXP_'(cho:ka)$'_OR_lxa_REGEXP_'%[a-zA-Z]*(cho:ka%?)$'
)__ORDER_BY_alpha
In this regexp
the $ symbol signifies end of line (though literally it means
up to a newline character).
The search template,
therefore, converts each column (e.g., Ameyaltepec word, ends with,
cho:ka) into a regexp. The expression Ameyaltepec word
is set up to prompt a search in three fields: lxa (the lexical
headword entry), lxaa (an alternate pronunciation of the headword
entry), and lxap (a practical orthography of the headword entry).
The search is actually carried out in fields that have been stripped
of diacritics (e.g., accents); however, a corresponding display field
which has the diacritics is maintained in database for online display.
Thus the MySQL database (which is how the information is stored) has
a field (or column) named lxa, which is the Ameyaltepec headword stripped
of diacritics, as well as a field named lxa_d, which is the original
field with all the diacrtics. The search is on the stripped-down field
(lxa), the display is of the original field (lxa_d).
Some users might
want to use regular expressions in their queries. They can do this
by selecting the fields to search on in the pulldown menu of the first
column ofthe search template, selecting regular expression
from the second column, and then typing in a regular expression in
the third column. For example, if users want to search for all words
that begin with /t/ or /k/ followed by a long /a:/ they have two options.
The first would be to use two rows of the search engine joined by
or:
- Ameyaltepec
wordbegins withta:
- or
- Ameyaltepec
wordbegins withka:
However, the same
result can be accomplished with a regexp. The user could search:
- Ameyaltepec
wordregular expression^[tk]a:
In this case the
user-entered regexp might not provide much of an advantage to letting
the search engine construct the same query. However, in other cases
the possibility of using regular expressions is a powerful tool.
What follows is
a brief explanation of the most important symbols used in regular
expressions: