Niclas Burenhult

Digital Multimedia Archive of Austroasiatic Intangible Heritage Phase II: Seeding Multidisciplinary Workspaces

This project takes language archiving and documentation infrastructure to a new level. Building on the outcome of the digital resource Repository and Workspace for Austroasiatic Intangible Heritage (RWAAI), the project will break new ground in the interdisciplinary adaptation of language resources. It will achieve this by (1) broadening reusability through the development of new tools and principles with multidisciplinary potential, (2) continuing acquisitions of Austroasiatic collections to maximize comprehensiveness of languages, and disciplines other than linguistics, and (3) engaging in outreach to provide training and expertise by collaborating with researchers in other disciplines and regions. Moving beyond vested linguistic interest, the resulting infrastructure will provide a research workspace of relevance to a range of disciplines, such as history, the arts, biology, geography, demography, and the food sciences. The project consolidates a uniquely qualified team of language specialists and experts in modern documentary technology. The small, endangered, and underdocumented Austroasiatic speech communities in Mainland Southeast Asia and India provide an unparalleled window on the history and cultural diversity of this region. The project represents the leading global initiative to document the intangible heritage of these communities.
Final report
AIMS AND DEVELOPMENT OF THE INFRASTRUCTURE
This project develops northern Europe’s leading digital language documentation resource, the Repository and Workspace for Austroasiatic Intangible Heritage (RWAAI). In a unique approach, it integrates legacy and modern research collections documenting the intangible linguistic and cultural heritage of speech communities in the Austroasiatic language family of South and Southeast Asia. It is particularly devoted to advancing the accessibility and reusability of the collection as a dynamic resource, continuing to acquire new collections, and increasing engagement. To this end, the project takes archiving and language documentation at large to a new level by focusing on adapting the documentary resource to multidisciplinary audiences, through committed outreach and the development of new approaches to promote innovative engagement with the resource.

PROJECT RESULTS
We report on the outcomes of our 3 primary objectives.

1. ADVANCING REUSABILITY. We developed a model for enhancing and exploring collections using Automatic Speech Recognition (ASR). ASR is the process of converting audio recordings that lack transcriptions into digital text. This has huge potential to be an integral part of the language documentation process. Compared to text, spoken audio has limited means of searchability and/or discoverability. Typically, speech recognition systems are trained on many different speakers, hundreds of hours of recordings, and a large amount of text in order to be able to generalize to new speakers. However, in the language documentation context, availability is usually limited to only a few speakers and small amounts of text. For this type of material, we developed a method that focuses on ASR for the textual content and not exact temporal alignment of speech and text and created a more speaker dependent ASR. The model was successfully used to enrich several existing collections providing transcriptions for previously untranscribed recordings. A second program geared to advancing reusability focused on deep integration of spatial information in the form of geographical coordinates in both modern and legacy collections. In particular, an innovative sub-project involved a GIS specialist and one of our depositors to create a spatial reconstruction and representation of fieldwork in the 1960s using geographic data from an archived collection. The resulting model has potential to render legacy collections more integrable and functional as significant resources in modern and future research.


2. ENLARGING AND ENRICHING THE COLLECTION. The project’s second aim was to enlarge the resource with new multidisciplinary collections and enrich existing collections. Our team completed the digitization of analog materials, preparation of metadata and ingestion of new accessions which has significantly expanded RWAAI’s coverage of Austroasiatic with the Temiar ethnographic and linguistic collection, the Kammu botanical collection, the Nicobarese human ecology collection and the Pnar, Kachok, Jedek, Kensiw and Mlabri linguistic collections. Staff also assisted depositors from Phase I in updating their collections with new material from ongoing research. Our international collaboration to digitize and archive analog recordings from the Orang Asli Archive (Keene State College, USA) further added several ethnographic collections to RWAAI. A focus was placed on increasing the reusability of existing collections by firstly integrating time-aligned transcriptions with recordings of wordlists, texts and songs from several collections, some dating back to the 1960s, and secondly, linking recordings with lexicons to enrich the transcriptions with automated interlinear glossing. ASR was used to produce transciptions. The Mlabri catalogue was translated from Danish into English to increase accessibility. Additional enrichment was generated by two international research projects which reused material from our collections. The enriched derivatives are archived with RWAAI.

3. INCREASING ENGAGEMENT. The project’s third aim was to promote increased engagement with our resource through a program of outreach. We focused on targeted public relations initiatives, presentation of RWAAI and its data at international conferences in the USA (3) and Singapore (1), and promotion of our principles and techniques through workshops. We also continued our engagement in archiving practices at an international level as a full member of The Digital Endangered Languages and Music Archive Network (www.delaman.org), an international organization of archives committed to advancing the preservation of intangible cultural heritage and the promotion of archiving. Outreach initiatives also resulted in involvement in international projects.

USE OF THE INFRASTRUCTURE, RESEARCH INITIATED
The project is concerned with the preservation of research materials documenting highly endangered languages and cultures for current and future generations. Project staff worked intensively with Austroasiatic researchers from various fields to encourage engagement with RWAAI as a repository for their research collections, and as a resource. The resource currently has 48 registered users from in the fields of linguistics, botany, musicology, human ecology, anthropology, and cultural heritage. The reusability of our digital collections as an educational and research resource has been demonstrated by the utilization of materials by students, researchers and research groups in linguistics, phonetics and language documentation. Two large external international research projects, one of which specifically targeted the reuse of corpora of “small languages”, have also used the resource. Locally, RWAAI serves as a repository and resource for students and staff at Lund University.

UNFORESEEN TECHNICAL AND METHODOLOGICAL CHALLENGES
Activities were severely impacted by the two-year long restrictions brought about by the COVID-19 pandemic. Firstly, international outreach and visits to LU by depositors could not proceed. Secondly, work-from-home restrictions curtailed the processing of data. In order to comply with best practices in data management and security, project assistants were unable to access and work with certain data sets from home, for example, scanning original research materials, and so alternative compliant tasks had to be found. This slowed down progress and curtailed the program of acquisitions due to the backlog of tasks left at the end of the project. Fortunately, the extension period enabled the retention of key staff and resulted in the completion of all outstanding tasks. During the work-from-home period the focus was on advancing reusability and significant gains were made.

INSTITUTIONAL INTEGRATION, LONG-TERM MAINTENANCE
Since its inception, RWAAI has been hosted on the Humanities Lab Archive Server, Lund University. In 2022, RWAAI was successfully migrated to the Archive Server’s new servers, and to a new archiving software solution, in the form of the FLAT software bundle developed by MPI at Nijmegen. FLAT is a CLARIN-compatible repository solution based on the open source Islandora/Fedora framework. The metadata was also updated to the CLARIN-initiated CMDI standard. Metadata is integrated with CLARIN and is harvested by CLARIN's Virtual Language Observatory (VLO; https://vlo.clarin.eu). The VLO was developed as part of CLARIN as a tool for discovering data, tools and services available in CLARIN and related communities. Furthermore, each individual data record is linked to handle.net, that provides persistent identifiers to information resources. This has also been updated so the links point to the new server. Upgrades like these are a crucial part of data preservation and sustainability that ensure continued accessibility to the collections. These migrations were seamless, demonstrating the resilience of our infrastructure and validating our original design principles. The long-term prospects for RWAAI are further enhanced by the participation of the Humanities Lab in the national HUMINFRA initiative.

RESOURCE ACCESSIBILITY, OPEN SCIENCE
RWAAI has been available online since its initial launch in 2012 (www.lu.se/rwaai). It operates in accordance with FAIR principles. All metadata can be browsed freely. The metadata is harvested by the VLO to maximize the findability of resources. Several individual collections are open access on registration with RWAAI, while others require permission from the depositor. This policy has proven to function well, and our depositors have approved all genuine requests to access their collections.

INTERNATIONAL COLLABORATION
The project is inherently international. Our recent additions are from researchers located in Asia, Europe and North America. In addition to the aforementioned international projects based in Switzerland and Germany, archive users, researchers and students, who accessed resources hail from Europe, North America, Asia and Australia. We have a cooperative venture with the Orang Asli Archive (USA). We continued our international engagement with archiving practices as a member of The Digital Endangered Languages and Music Archive Network (www.delaman.org), an international organization committed to the promotion of archiving and advancement of the preservation of endangered intangible cultural heritage.
Grant administrator
Lunds universitet
Reference number
IN17-0183:1
Amount
SEK 6,748,000
Funding
RJ Infrastructure for research
Subject
General Language Studies and Linguistics
Year
2017