Gerd Carling

A Revised and Digitised Lexicon over Tocharian A



Tocharian A and B are two Indo-European languages that are attested in text fragments from present Xinjang, China, dated 6-800 AD. The total number of fragments is about 4,000, and 520 of these are written in Tocharian A. Most A-texts are preserved in Berlin and were published in transliteration by Emil Sieg - Wilhelm Siegling 1921. This material has been collected into a concordance by Pavel Poucha Thesaurus Linguae Tocharicae (1955). Unfortunately, this publication contains many errors in the identification of individual lexemes and passages. A collection of fragments, unearthed in Yanqi 1974, has shed new light on earlier material and solved many questions that had arisen with Poucha's thesaurus. The first step of the present project will be to build up a text database consisting of all A material: the Berlin and Yanqi texts plus unpublished A texts from Paris. This will be completed by restorations and corrections by integrating a reference database, containing references to individual passages and texts. Then Poucha 1955 will be examined and lemmata that require revision will be dealt with completely. In connection with this parallel words and passages in Uighur and Sanskrit will be included. The purpose is to provide a complementary volume to Poucha's thesaurus and the word index of Tocharian A in Wolfgang Krause - Werner Thomas Tocharisches Elementarbuch (1964), the two main present lexical facilities for Tocharian A.
Final report

Gerd Carling, Lund University

Introduction

According to the first version of the application the main purpose of the project was to produce a complementary volume to existing dictionaries of Tocharian A, Poucha (1955) and the word-list of Tocharisches Elementarbuch II, Texte und Glossar (1964). In this version of the application, the project focussed on the absence of reliable and complete dictionaries of Tocharian A, a circumstance that made the study of this language very difficult. However the appearance of a number of new texts (like the YQ-texts, published by Ji, Winter & Pinault 1998) and a number of new editions of texts in parallel languages, such as Sanskrit and Uighur, with relevance for the Tocharian A texts, provided material for a new, updated dictionary. As a result the project period of the application, two years, was regarded as too limited to produce a complete dictionary, the application intended to produce a "complementary volume" and a text- and reference database, containing all sorts of relevant information on and around the text material. The work was supposed to be carried out mainly by the applicant Gerd Carling, but in collaboration with professor Georges-Jean Pinault, Paris, and professor emeritus Werner Winter, Kiel. Work on the project started in January 2003. During the summer of 2003 a complete but preliminary version of the database (Text- and Reference Database of the Tocharian A Language, TTAL), was set up. The database was then continuously extended, improved and updated during the work on the dictionary, which will be published during 2008 (DTA I). Later during 2003 a preliminary version of an updated "complementary dictionary" was completed, focusing on errors and mistakes in Poucha (1955) and items omitted TEB II.

However, the collaborators soon agreed that this was not a satisfactory form for a publication. The material (which later was also increased by the appearance of a number of new texts, see §2), should be published in the form of a renewed form of the dictionary/thesaurus of Poucha (1955), with the addition of the new text material. The production of a renewed version of a complete dictionary was started, and the project was further financed by SCAS, Uppsala, for 2004-2005. The renewed version of the dictionary was intended to contain all improvements of translations and text readings of existing texts, new translations, new lexemes, new forms, new texts, and other improvements, gained by comparison with external sources and languages. During 2004 a suitable form for the dictionary was discussed and agreed upon. The dictionary was envisaged not as a copy of Poucha (1955), but with changes made to the items in order to facilitate accessibility for the reader. A contract for the complete dictionary was signed with Otto Harrassowitz, Wiesbaden, a publisher who has specialized in language dictionaries and grammars of this type. It soon became apparent that a dictionary in this form would contain 7-800 pages, and it was decided that it should be published in three volumes or fascicles. During 2004-2005 a first version of the first volume, containing the letters a-j, was completed. Thereupon, the process of reviewing the whole dictionary, proof-reading the Tocharian text samples, checking forms and passages, and controlling consistency of dictionary items, several times and by all three collaborators began. Several minor projects delayed the time schedule. First, it was decided that the system of verb roots and stems, worked out by Werner Winter and used by most Tocharian scholars today (cf. Hackstein 1995), should be presented for the first time in its entirety in the dictionary. Further all Tocharian A texts should be included in the Thesaurus. After the starting of the project, a number of previously unpublished texts in Tocharian A were published in photo on the homepage of the project TITUS: Tocharica. Here, the texts have not been classified as Tocharian A. Instead, they are found among thousands of Tocharian B fragments. The work of sorting Tocharian A texts from Tocharian B began, and thereupon transliterations of the 619 Tocharian A texts were prepared by Georges-Jean Pinault for inclusion in the dictionary. However, it soon turned out that these texts were much more fragmentary than the other 467 texts from Berlin which had been published in transliteration by Sieg & Siegling (1921) and included in the TTAL database (more on the texts under §2). Thereupon, the dictionary was proof-read by all collaborators twice. This procedure led to a number of improved text readings and translations. The first volume will be published during spring 2008. 

Beside the most important purpose of the project: the production of the dictionary, the project has provided material for number of articles by Gerd Carling and Georges-Jean Pinault. These are listed under Scientific publications and are discussed under §4.
The results of the project will be discussed under three headings, §2) A Dictionary and Thesaurus of Tocharian A, §3) Text and Reference Database of Tocharian A, and §4) Others. 

A Dictionary and Thesaurus of Tocharian A

A Dictionary and Thesaurus of Tocharian A is the title of the dictionary, which is the main result of the project "A Revised and Digitized Dictionary of Tocharian A". It will, according to plan, be published in three volumes. The first volume, a-j (250 pages, [1]) will be published during spring 2008. The book will be presented at the conference "Die Erforschung des Tocharischen und die alttürkische Maitrisimit", at Berlin-Brandenburgische Akademie der Wissenschaften, April 5-6 2008. 

(see http://www.bbaw.de/bbaw/Forschung/Forschungsprojekte/turfanforschung/de/TocharischMaitrisimit2008).
The dictionary is a complete inventory of all presently accessible Tocharian A texts, which number around 1130, 1080 of which are from Staatsbibliothek zu Berlin - Preußischer Kulturbesitz, 43 from Xinjiang Musum, Urumchi, 3 from Bibliothèque Nationale, Paris, and 1 from Musée Guimet, Paris. This amount is not the same as the number of texts given in the application from 2002, which was around 520. Since then, 619 Tocharian texts have been added from among the Tocharian texts published in photographs on the homepage of the project TITUS: Tocharica. However these texts, though previously unpublished in photo or transliteration, have already been partly published in various publications. Individual items, forms, and passages are included in Poucha (1955), since Poucha had access to the notes of Emil Sieg and Wilhelm Siegling, who prepared transliterations of all Tocharian A texts, but who only published transliterations of the texts A 1-467 (Sieg & Siegling 1921). For the moment it is not known where the transliterations of Sieg and Siegling could be. At the end of WWII, the Museum for Indische Kunst was destroyed, and a number of texts disappeared. The remaining Tocharian texts (4074 manuscripts or pieces of manuscripts) have been published as photos on the homepage of the project TITUS: Tocharica, with new numbers (THTxxxx) but not identified as being Tocharian A or Tocharian B.
For the purpose of the DTA, all these manuscripts have been examined. Georges-Jean Pinault has prepared transliteration of the Tocharian A manuscripts. The number of Tocharian A fragments is 619. Besides, Gerd Carling has prepared a list of items from these unpublished texts as presented in other publications (e.g., Poucha (1955) and publications by Werner Thomas+, who was a student of Sieg and Siegling and who also probably had access to the transliterations of Sieg and Siegling). In these publications, forms, items or shorter phrases are never given with a reference to the manuscript number, only "unpublished Berlin manuscript". 

However, a problem when working with the unpublished texts has been that there are great discrepancies between our reading and interpretation of the originals and the material collected by Gerd Carling from various publications. This can have several causes: First, Sieg and Siegling's readings may have been uncertain from the beginning, or maybe mere proposals, and later they have been understood as certain and quoted again and again by authors. Second, it is likely that a large part of the manuscript material has been lost and many of the uncertain interpretations are hidden in this lost material. For the purpose of the dictionary DTA, readings of the unpublished texts from Bibliothèque Nationale have been prepared by Georges-Jean Pinault in collaboration with Gerd Carling and a manuscript from Musée Guimet has been transliterated and edited by Pinault (2007). This material has been included in the dictionary.

Beside these texts, a number of Tocharian A glosses in Sanskrit texts have been published by Malzahn (2007). This material has also been included.
The dictionary DTA I begins with an introduction, where the project is described and where the organisation of the dictionary items is explained. This introduction is followed by lists of abbreviations of grammatical forms, various texts and so forth, and finally by lists of references.
The dictionary itself is organized in the alphabetical order of the Sanskrit language. This is usual in dictionaries of Tocharian, since Tocharian is written with a form of the Indic Brâhmî script. The dictionary DTA I begins with the letter a- and ends with j, which corresponds roughly to one-third of the whole language.

The dictionary articles contain first a head line with the dictionary item in bold, and thereafter, in parenthesis, information about word class and gender. Thereupon follows translation, Sanskrit/Uighur equivalent (in round brackets), if extant, and corresponding form in Tocharian B, if known [in square brackets]. Next line is marked as L (for Literature) and gives the translation of the item in earlier dictionaries. Here, the translations sometimes show full agreement, which indicates that the translation of the word is unproblematic. Sometimes, the translations show complete discrepancies, which indicates that the word is problematic to interpret. After that a line F for Forms follows which gives a complete list of all forms and variants of the item. This line represents an innovation with respect to all earlier dictionaries and it gives the reader a possibility to have an overview of the existing forms and variants of the language. Tocharian A is a language where most forms also occur in several variants, all of which are given here. The line lists only attested forms and no reconstructions. For the verbs there is a further line P for Paradigms. This line lists the complete verb paradigms, including all stem forms, as well as reconstructed variants. It has been developed by Werner Winter. The purpose of this extra line for verbs is the fact that Tocharian A verbs are extremely complicated for a dictionary user. Forms and variants are often difficult, sometimes even impossible, to trace back to their lexical roots, and with this Paradigm section it will hopefully easier to find the roots of forms and vice versa. Suppletive verb stems are also presented under their respective letter with a reference to the basic lexical root. This is something that is often lacking in older dictionaries and will be a major improvement.

After the Paradigm and Forms sections a section S for Syntax follows. Here, common constructions and usages, fixed formulas and phrases are listed. Tocharian A is a highly literary language and fixed phrases and formulas are frequent. Thereupon, the Thesaurus, T, follows. References to all occurrences in texts are listed and organised according to forms and variants. A selection of coherent and relevant text passages with translations are quoted in transliteration. A special system for hyphenation, considering the latest research on morphology, has been developed by the collaborators.

After the Thesaurus section there is a section D for Derivation. This section contains different types of information. For some words, information about internal derivation within Tocharian A is given, for instance if the word is a derived form of a verb, noun or adjective. Concerning other items, the following policy has been followed: if the word is a borrowing of relatively recent date and unproblematic, the source is given with reference to well-known and relevant dictionaries of these languages (SWTF, BHSD, etc.). This concerns words from Sanskrit, Iranian, Chinese, or Turkic. This is necessary since a large part of the vocabulary in the texts originates in these languages. Because of the many problems connected with the reconstruction of pre-Tocharian phonology and the difficulties in tracing words back to Indo-European, the etymologies of inherited words are not discussed. Actually, most of the relevant discussions about words for which Indo-European inheritance is certain or likely is already available in previous handbooks as in the dictionary of Tocharian B published by Adams in 1999. It is useless to increase the bulk of etymologies that are based on guess-work, before the precise meaning of Tocharian word has been ascertained. 

Finally there is a section R for References, which gives references to literature and discusses problems connected with the translation of the word and the interpretation of passages of relevance for the translation. In order to increase the transparency of the dictionary, the Forms- and Thesaurus sections do not contain any references. They have been collected under this heading. The work with the dictionary DTA and the preparation of text passages for the dictionary items has yielded a number of new solutions to old problems. The basic method has been to rely on comparative literary analysis, i.e., to consult at first parallel texts in better-known languages, such as Sanskrit and Uighur, and to infer meanings and translations of words and passages from parallel texts rather than internal or external reconstruction. These latter methods have also been used but rather as alternatives and complements to comparative literary analysis. For this reason, much effort has been devoted on literary analysis of texts and their contents in order to detect parallel texts. This searching for parallel texts has also had a number of side-effects and given solutions to problems beyond the item being investigated.

The dictionary contains a number of new results in the form of new interpretation of meanings of previously known items, completely new readings of passages and reinterpretation of word and morpheme boundaries and thereby the discovery of new items in previously published texts, detection of new items from previously unpublished texts, reinterpretation of inflected and conjugated forms, reinterpretation of verb stems and forms as belonging to other roots and paradigms than previously assumed, new translation of phrases, passages, and texts, identification of the source of borrowed words (from Sanskrit, Uighur, Iranian) and thereby reinterpreted meaning, and so forth. It is the hope that all these new proposals will give rise to discussions among Tocharologists, Indo-Europeanists, and representatives of adjacent disciplines.

Text and Reference Database of Tocharian A (TTAL)

The Text and Reference Database of Tocharian A has been a prerequisite for the process of working on the dictionary and will continue to be even more important in the future, if it can be completed by all texts and the search engines can be adjusted.

The database TTAL (http://www.ling.lu.se/projects/Tokhariska/ username: tocharianA, password: tokhare, observe that diacritic signs only can be displayed if the free program FireFox is used, see http://www.mozilla.org/) contains the texts A 1-467 in transcribed form. These texts, from the beginning transcribed copies of Sieg & Sieglings (1921) transliterations have been improved, corrected and restored continuously during the work on the dictionary. Still, many translations are missing, but these translations will be incorporated from the final version of the dictionary DTA. Translations have been changed and improved up to the last version of the dictionary. Beside every transcribed text line there is a line of translation and a line for references to translations or commentaries on the particular line. Above the texts the following information is specified: manuscript number (which is often different from text number), reference to photo of original manuscript, content (literary reference), transliteration/ translation/ commentary of whole manuscript, parallel text (if known/ extant), reference to transliteration/ translation/ commentary of parallel text (if known/present). This information highly facilitates the work with the individual texts, the searching for parallel texts and the presence of discussion of individual problems in texts.

The database TTAL is the essential foundation of the work on the dictionary from which most information is taken.
However, the database would require some improvements to be complete. First, all texts should be present in the corpus, e.g., the unpublished texts from Paris and Berlin and the texts from Urumchi. Besides, a new search engine, which can search form items, forms and variants, disregarding non-textual signs and phonemes that relate to variants, should be developed. If the search engine were developed, older dictionaries such as Poucha (1955) and TEB II would no longer need to be used as the primary source of information. Poucha (1955) has a number of misquotations, of which it can be impossible to find the source in the mass of texts. Furthermore, the speed of processing the remaining dictionary volumes would be considerably increased by using an automatic tool for detecting forms, variants, and passages.

Other results of the project

The work on the dictionary has generated a number of side-results, both of lexicographical and of morpho-syntaxtic character.
In a number of articles Gerd Carling has dealt with the structure of the verb and valency system, something which is directly a result of working on the verbs for the dictionary [2,4,5,]. Furthermore she has published an article about loanwords in Tocharian and their consequences for the relative and absolute dating of pre- and proto-Tocharian [3] and a handbook article on Tocharian syntax with extracts from the dictionary [6]. Georges-Jean Pinault has published several articles with the support of data of the dictionary [7-9]. 

Summary

The results of the project "A Revised and Digitized Dictionary of Tocharian A", financed by RJ for three years and SCAS, Uppsala, for one year, can be summarized as follows: 1) A Dictionary and Thesaurus of Tocharian A, Vol 1, a-j, 2) Text and Reference Database of Tocharian A, 3) several minor publications on lexicography, morphology, language contact and syntax of Tocharian. Most publications are in line with the project as presented in the original publication, but some of them are the result of unexpected side-results of the research carried out for the main purpose of the project: the production of a dictionary and a database. Besides, the dictionary, the database and the side-results have been presented at a number of seminars and conferences, both nationally and internationally (see Scientific publications and other activities ...). The production of the dictionary and the database has given a number of results in the form of re-interpretation of meanings of previously known words of Tocharian A, completely new reading of passages and reinterpretation of word and morpheme boundaries and thereby detecting of new items in previously published texts, detection of new items from previously unpublished texts, reinterpretation of inflected and conjugated forms, reinterpretation of verb stems and forms as belonging to other roots and paradigms than previously assumed, new translation of phrases, passages, and texts, detection of the source of borrowed words (from Sanskrit, Uighur, Iranian) and ensuing reinterpretation of meaning, and so forth. The results are presented mainly in the dictionary and, in a few cases, in the articles. The results are integrated, but not presented specifically, in the database. Next step in the process would be to await the response from Tocharian scholars as well as representatives for adjacent disciplines in the form of reviews and comments.

References

BHSD = Edgerton, Franklin 1953. Buddhist Hybrid Sanskrit Grammar and Dictionary. Volume II: Dictionary. New Haven (Conn.): Yale University Press.
DTA I = Carling, Gerd, in collaboration with Georges-Jean Pinault and Werner Winter to appear (2008). A Dictionary and Thesaurus of Tocharian A. Vol. I: a-j. Wiesbaden: Harrassowitz.
Hackstein, Olav 1995. Untersuchungen zu den sigmatischen Präsensstammbildungen des Tocharischen. Göttingen: Vandenhoeck & Ruprecht.
Ji Xianlin, Werner Winter & Georges-Jean Pinault 1998. Fragments of the Tocharian A Maitreyasamiti-N??aka of the Xinjiang Museum, China. Berlin - New York: Mouton de Gruyter.
Poucha, Pavel 1955. Thesaurus Linguae Tocharicae Dialecti A. Praha: Státní Pedagogické Nakladatelství (Monografie Archivu Orientálního, Vol. XV).
Sieg, Emil & Wilhelm Siegling 1921. Tocharische Sprachreste. Sprache A. I. Band: Die Texte. Berlin - Leipzig: Walter de Gruyter.
SWTF = Sanskrit-Wörterbuch der buddhistischen Texte aus den Turfan-Funden. Begonnen von Ernst Waldschmidt. Im Auftrag der Akademie der Wissenschaften in Göttingen hrsg. von Heinz Bechert, bearbeitet von Georg von Simson et al. 1973-. Göttingen: Vandenhoeck & Ruprecht. Published by fascicles, 2 complete volumes (Vokale and k-n) so far.
TEB II = Thomas, Werner & Wolfgang Krause 1964. Tocharisches Elementarbuch. Band II: Texte und Glossar. Heidelberg: Carl Winter Verlag.
TITUS: Tocharica. By Jost Gippert, University of Frankfurt. http://titus.fkidg1.uni-frankfurt.de/texte/tocharic/thtframe.htm. Entrance dates from December 2005-.
TTAL = Text and Reference Database of Tocharian A. By Gerd Carling. http://www.ling.lu.se/projects/Tokhariska/

Grant administrator
Lunds universitet
Reference number
J2002-0611:1
Amount
SEK 680,000
Funding
Bank of Sweden Donation
Subject
General Language Studies and Linguistics
Year
2002