The Online Database of Interlinear Text



(Note: See the recent paper on ODIN, to be presented at the e-Humanities workshop at e-Science 2006: http://faculty.washington.edu/wlewis2/papers/ODIN-eH06.pdf.)




ODIN, the Online Database of Interlinear text, is a repository of Interlinear Glossed Text (IGT) extracted mainly from scholarly linguistic papers.  The repository is both broad-coverage, in that it contains data for a variety of the world's languages (limited only by what data is available and what has been discovered), and rich, in that all data contained in the repository has been subject to linguistic analysis.  IGT is a standard method within the field of linguistics for presenting language data, with (1) being a typical example.  Common in IGT is a phonetic transcription of the language in question (line 1), a morphosyntactic analysis which includes a morpheme-by-morpheme gloss and grammatical information of varying sorts and granularity (line 2), and a free-translation (line 3).


(1)    apiya=ya=at                            QATAMMA=pat    tapar-ta

         at that time=CONJ=3SG.N     in the same way      rule-PAST

         "And at the same time he ruled it in the very same manner."            (Agbayani & Golston 2004)


Currently, ODIN gives pointers to scholarly papers on the Web that contain instances of IGT.  You can search for IGT by language name, either through the OLAC search interface at LinguistList or LDC, or by clicking one of the language codes or names shown on this page (if you do not see the list of language codes and names to the left, click here).  ODIN will show you a list of links to documents that contain IGT in the language you have chosen, the number of instances of IGT in each document, and to what degree the instances of IGT have been verified to be in that language.  Future enhancements will include the ability to search for information within IGT instances, such as specific grammatical concepts (e.g., Conjunction, 3rd Person, Singular, Past Tense), and the ability to list the specific instances of IGT and allow comparisons between instances across languages or within the same language.


ODIN is still under construction.  Crawlers and extractors are currently running to expand the size of the repository, and the algorithms used for language identification and IGT migration are being improved and tested.  The current statistics are shown to the right.  These numbers will increase as more IGT is discovered and cataloged.


ODIN has been constructed as part of the greater goal envisioned within the GOLD Community, and has been funded by the NSF under the Data Driven Linguistic Ontology grant (BCS-0411348, William Lewis PI), and by the California State University, Fresno.  Initial funding and support was provided by the Electronic Metastructure of Endangered Languages Data (EMELD) grant (ITR-0094934). Significant work has been done by three student assistants, Gregg Deslauriers, Hector Gonzalez, and Sam Trenholme, all of whom have spent many long days getting ODIN online.  Should you have any questions about ODIN, please feel free to drop William Lewis a note at wlewis AT csufresno DOT edu.


