Nahuatl Dictionary: Report for Year 2

Database Refinements

We have expanded and refined the Distance Research Environment (DRE), our online interface for data entry for the Nahuatl dictionary, as we have incorporated new data sets over the past year. We have had to develop several new database layouts (screens, reports) and scripts with complex algorithms to assist in the alignment of data sets that have different spelling variants.

 

Dissemination

We continue to make the lexicon available online as it grows. The URL for the dictionary is: http://whp.uoregon.edu/dictionaries/nahuatl/. We have revised the design of the site over the past year and improved search features and the ranking of results. We have also continued to refine search algorithms to optimize results for each of the three language interfaces (English, Spanish, Nahuatl).

We are postponing the printing of the lexicon until it is more complete and the alignment of diverse data sets has been accomplished.

We are also postponing consultations with Sullivan on the linguistic and ethnohistorical interfaces until we have the International Phonetic Alphabet (IPA) renderings of all headwords. For now, we feel that the current search interface meets the needs of both scholarly and casual users. All users are presented with a set of results containing any ethnohistorical and linguistic information available for an entry. At the same time, search results also contain translation and definition fields that would be of interest to a wider audience.

Browsing Preset Themes: We have added a new feature to the site that allows users to search using pre-defined themes. Stephanie Wood has identified 2,288 words within 30 themes likely to be of interest to scholars and students. These themes cover a broad range of topics, including: Gender; Food and Beverages; Animals; Architecture; Race and Ethnicity; Transportation, War and Conflict; Technology and Tools; Sexuality and Fertility; Health and Medicine; and Religion. For a complete list see: http://whp.uoregon.edu/dictionaries/nahuatl/index.lasso?&dowhat=browse_themes. Initial assessments have proven that some users are not sure what to enter in the search mechanism, and it can be helpful to provide these preset searches to give the visitor an idea of the functionality and content of the lexicon.

 

Google Analytics:  Tracking Usage

We continue to use Google Analytics to track public access to our websites. The number of hits and users has increased over 300% in the project’s second year. Between September 1, 2010, and August 24, 2011, we had 25,930 visits by 18,917 unique visitors from 101 countries. Most of our visitors are from the United States (most notably California) and Mexico, but there are also significant numbers from Europe and Canada, as well as parts of Asia and Africa. Visitors viewed an average of 5 pages per session – an indication that visitors are actually using the site to search and view dictionary results.

 

Expansion and Integration

In our second year we have expanded the lexicon with both colonial and modern Eastern Huastecan Nahuatl words.  The dictionary now holds 32,235 entries in all. Of these, 10,966 are complete with English translations and 7,340 also include definitions in modern Eastern Huastecan Nahuatl. Almost all (31,887) now include a Spanish translation from one or more sources. We have been working to vet and complete the remaining entries, and to identify any duplicates overlooked on initial integration due to differences in spelling conventions.

Modern Eastern Huastecan Nahuatl: IDIEZ has provided 7,340 headwords for the online dictionary. These entries contain modern Eastern Huastecan Nahuatl definitions, Spanish translations, English translations, examples of usage, and parts of speech. Thus far 507 of these headwords have been matched with their colonial Nahuatl counterparts, and the remainder of the headwords have been integrated as new entries.

Karttunen: With permission from the author and the publisher, we have integrated 1,408 entries from Frances Karttunen’s Analytical Dictionary of Nahuatl (1992) in to our lexicon, and are in the process of incorporating 7,481 more. Karttunen’s work provides English and Spanish analyses of headwords in Nahuatl, while also showing vowel length and glottal stops.  The IDIEZ team was responsible for digitizing Karttunen’s work and the Oregon team has been working on: 1) reformatting the content with XML bibliographical references; 2) designing mechanisms for integrating matches with existing entries; and, 3) creating new entries where there are no matches.

Alonso de Molina: Thanks to Joe Campbell, who, through IDIEZ, gave us a digital copy of Fray Alonso de Molina’s 1571 Vocabulario en lengua mexicana y castellana, we have been able to add this valuable colonial vocabulary to our database and rapidly expand the number of lexical entries. This was a 322-page work in Microsoft Word format. IDIEZ is now negotiating with Joe Campbell with the hope of obtaining his larger database for our lexicon. This larger database includes the three dictionaries published by Molina and Campbell’s lexical index to the multi-volume sixteenth-century Florentine Codex of Fray Bernardino de Sahagún and his indigenous informants.

A big challenge of Year 2 has been folding all this new data (modern Nahuatl, Karttunen, Molina) into our existing dictionary without creating redundant entries. Headwords in all four data sets follow different spelling conventions, so matching them up requires several passes with scripts to alter spelling in various ways to illuminate potential matches, then a verification phase overseen by trained student workers, and finally, when needed, verification by a scholar (Stephanie Wood). We have successfully automated parts of this process, but it is still very painstaking. We are hopeful that the addition of IPA spelling for all entries beginning this Fall, in Year 3, will make this process of matching and eliminating duplicates proceed much more quickly.

 

Diversification

While we made some progress experimentally adding audio, video, and pictographic image elements to the dictionary in Year 1, Year 2 has been largely focused on the integration of additional large, textual data sets. IDIEZ is currently negotiating with several radio stations in Zacatecas for professional recording support, so that we can begin the systematic audio recording of each headword in Year 3.  We have also acquired quality recording equipment for IDIEZ, for enabling their own recordings with native speakers on the team.

Our planned focus group work with native speakers from the states of Guerrero and Puebla or Tlaxcala, added to the team from the Huasteca who are in Zacatecas, will also produce additional audio in Year 3, and this audio will highlight regional variations in Nahuatl. We had originally planned the focus group work in Year 2, but it was postponed for Year 3 owing to the availability of our linguistics graduate assistant.  Because the focus group work was postponed, we have also waited to develop the list of terms for the consideration of the native speakers who will now come together in October 2011.  IDIEZ will prepare the list of terms prior to this meeting.

We have also been donated a data set (297 pages in MS Word) that mines nineteenth- and twentieth-century sources of Nahuat (Nawat, Náwat), Pipil, and Nicarao, all from Central America, compiled by Rick McCallister (Delaware State University) and Rafael Lara-Martínez (New Mexico Tech). Matches could be illuminating. For example, the Nahuat term xinachtéya, to inseminate or reproduce, has an obvious connection to xinachtli in Nahuatl, which means seed. But, we will hold off on searching for matches and integrating relevant material from this data set until we have perfected our alignment techniques and more fully incorporated earlier donations of data.  In the end, however, this diversification of the lexicon could be illuminating of the considerable reach of Nahuatl in earlier times and its longevity.