The Astrid Lindgren Code: Accessing Astrid Lindgren’s shorthand manuscripts through handwritten text recognition, media history, and genetic criticism
Astrid Lindgren holds a unique position within world literature, yet her enigmatic creative process has for many years been hidden in her original drafts and manuscripts, written in Melin shorthand/stenography. These manuscripts have been considered “impossible” to access and have to date never been subject to research.
The purpose of “The Astrid Lindgren Code” is twofold:
1) To access Lindgren’s original stenographed drafts through adaptation of algorithms for handwritten text recognition (HTR), and to refine and develop this digital method further through crowdsourcing.
2) To study the implications of deletions, alterations, and revisions in the stenographic drafts of Lindgren’s work, with particular emphasis on Bröderna Lejonhjärta/The Brothers Lionheart (1973), from the perspectives of genetic criticism, sociological editing, and media history. This part of the study is structured around the three simultaneous roles of author, secretary, and editor that Lindgren took in her own creative process.
This three-year project utilises the joint competences of literary scholars, computer scientists, and professional stenographers to unlock the potential in the original drafts to produce new knowledge of world author Lindgren, enable a starting point for full digitalisation and transliteration of Lindgren’s original manuscripts in the future, and provide a general vehicle for methodological development for analysis of handwritten documents.
Final report
Project Purpose and Development
The overarching goal of the project was to generate new insights into the world-renowned author Astrid Lindgren by conducting the first study of her original manuscripts, showcasing the potential of these manuscripts for further research, providing a foundation for the digitization of Lindgren’s entire preserved corpus of shorthand notebooks, and contributing to the broader development of methods for analyzing handwritten documents. All of these objectives were successfully achieved by the conclusion of the project.
The first phase of the project focused on digitization and the pre-processing of digitized materials to address specific challenges of applying handwritten text recognition (HTR). Volunteers were recruited for crowdsourcing, and a trial platform was tested with a focus group. The second phase involved conducting a series of hackathons with volunteers, which resulted in a corpus of transliterated and peer-edited material. This material came to constitute the dataset that served as the foundation for training methods in handwritten text recognition tailored to the unique aspects of stenography. The third phase focused on the dissemination of results and the completion of the literary research. This phase also laid the groundwork for a future scholarly edition of Lindgren’s original manuscripts.
The project’s timeline was extended one year beyond the extra year allocated, due to Malin Nauwerck’s parental leave. Karolina Andersdotter, initially a librarian at Uppsala University Library (now PhD student in library and information science at Åbo Akademi), has taken on an active role in the project and co-authored several scientific publications. Anders Hast’s work in the project has to a high degree involved the supervision of PhD candidate Raphaela Heil whose doctoral dissertation Document Image Processing for Handwritten Text Recognition: Deep Learning-based Transliteration of Astrid Lindgren’s Stenographic Manuscripts was published in 2023.
While the COVID-19 pandemic initially restricted physical travel and research opportunities, it accelerated digital crowdsourcing efforts. A significant portion of the project’s volunteers were in the “70+” age group, who were encouraged to remain at home during the pandemic. Feedback from a survey indicated that approximately half of the most active volunteers perceived that their ability to work with the manuscripts was increased due to the pandemic, and the crowdsourcing community as well as the transliteration work were described as “a ray of light in a dull time.”
Central Results and Conclusions
Digitization, Manual Transliterating, and Future Research on Lindgren’s Stenographed Original Manuscripts
The project has successfully digitized and transliterated 55 of Lindgren’s shorthand notebooks held in the Astrid Lindgren archive (L230) at the National Library of Sweden, alongside an additional 8 notebooks in the collections of Swedish Institute for Children’s Books and 3 owned by Astrid Lindgren AB. This work has dispelled the popular myth of Lindgren’s stenography as unreadable, demonstrating that individuals with knowledge in the Melin shorthand system can transliterate the material at a character level.
Today stenography is an endangered skill, with a diminishing number of practitioners. The majority of volunteers who were involved in the transliteration efforts are women born between 1930 and 1970, often with backgrounds as secretaries or stenography instructors. A key factor for their participation has been the ability to work from home since the transliteration process is demanding and often requires external aids. In 2020, the National Library’s Manuscript Department imposed a photography ban on L230:5. Consequently, researchers wishing to study Lindgren’s stenographed original manuscripts must read them in the National Library’s special reading room, where they cannot copy or photograph the materials, even for personal use. This restriction severely limits the ability to assess and manually transliterate material that is not yet digitized. There are compelling reasons to digitize the entire collection of Lindgren’s preserved shorthand notebooks (a total of 670). However, until digitization is fully realized, it is this project’s conclusion that the National Library’s reason for photography ban needs to be carefully balanced against the need for accessibility and research opportunities.
Digital Method Development in Crowdsourcing and HTR
The project utilized a mixed-methods approach for transliterating Lindgren’s original manuscripts, combining manual transliteration through crowdsourcing and organized hackathons with HTR techniques. Manual transliteration through crowdsourcing emerged as a surprisingly effective strategy, yielding numerous positive outcomes for both the volunteers and the project. Not least because it significantly enhanced public interest and dissemination of the project’s findings. With the development of a crowdsourcing method, the project has contributed with an intersectional perspective on motivation, highlighting age and gender as well as the importance of specific expertise and the emotional connection crowdsourcing volunteers felt toward the research task and the material.
Combined with the digitized and preprocessed manuscripts, the transliterations produced through crowdsurcing formed the basis for the first openly accessible dataset (LION) for HTR training on stenographed material. Computerized image analysis methods were developed to address the challenges posed by crossed-out text. Through the LION dataset, the project established a baseline for HTR of Swedish stenography. Based on this foundation, advanced methods for digitally representing stenographic transliterations during HTR training were explored. The proposed methods yielded significant improvements in recognition performance when combined with pre-training. The establishment of an HTR baseline also serves as foundation for future work. One avenue for future research is for example the integration of crowdsourced, human feedback, as well as stenographic knowledge and experience, into the HTR training pipeline.
Contribution to Lindgren scholarship
The project has demonstrated how stenography functioned as the engine in Lindgren’s creative process, and what role it played in the interplay between the various production roles that Lindgren assumed. The project has shown how Lindgren through self-presentation and metafiction contributed to the myth-making surrounding her shorthand manuscripts. Combined with her distinctive position in the field of children’s literature as author-publisher and her assuming of several different roles in the production process of her own fiction, this myth has been essential in the understanding of Lindgren as intuitive, autonomous, and sovereign in her creative process. Another key finding is the identification of a connection between stenography as a writing method and Lindgren’s literary style, termed as a “stenographic effect”. The consideration of this effect enhances our understanding of Lindgren’s work within the context of children’s literary modernism as well as her significant influence on transmediation and adaptation of children’s literature in the latter half of the 20th century.
New Research Questions
Key questions for the future include how to sustainably make the content of Lindgren’s remaining original shorthand manuscripts accessible as well as how digital transliteration techniques can be further developed to compensate for the diminishing number of individuals proficient in stenography. The positive outcomes and high levels of engagement linked to citizen research and crowdsourcing in the project suggest that it is feasible to expand these elements in connection with Lindgren's authorship for a wider audience.
The material digitized and transliterated within this project, particularly the original manuscripts of The Brothers Lionheart, offers rich potential for further research. In the project’s concluding phase, work has begun on a scholarly edition of these manuscripts. Given the novel’s centrality in the children’s literature canon and the sustained public interest in Lindgren’s legacy, this edition holds promise for fostering new avenues for public engagement with cultural heritage.
Dissemination of results
The research findings have been widely shared within the academic community through a combination of co-authored and individual publications, as well as through participation in national and international conferences and guest visits to various institutions. The project has established collaborations with numerous organizations, including the National Library of Sweden, Astrid Lindgren AB, the Melinska Stenografförbundet, the Astrid Lindgren Society, and Astrid Lindgren’s Näs.
Public lectures and presentations have been integral to the project from its inception, continuing throughout its duration. Notable events have included lectures for the language unit at the Swedish Parliament, contributions to Humtank’s events on citizen science during Almedalsveckan, participation in the National Library’s lecture series, and involvement in programs organized by the Swedish Institute for Children’s Books.
From an early stage, the project garnered substantial media attention. Nationally, it has achieved over 400 mentions in the media service Retriever, while internationally, it has been featured in several prominent media channels such as German Die Zeit and Spanish El País. This attention proved advantageous for recruiting and motivating stenographers as volunteers for crowdsourcing activities, but also for disseminating results, as well as fostering new international partnerships. The Swedish Academy’s Bernadotte Scholarship in 2020 also made it possible to explore a collaboration with Astrid Lindgren’s childhood home, Näs, and to provide a contribution to the museum’s permanent exhibition.
The overarching goal of the project was to generate new insights into the world-renowned author Astrid Lindgren by conducting the first study of her original manuscripts, showcasing the potential of these manuscripts for further research, providing a foundation for the digitization of Lindgren’s entire preserved corpus of shorthand notebooks, and contributing to the broader development of methods for analyzing handwritten documents. All of these objectives were successfully achieved by the conclusion of the project.
The first phase of the project focused on digitization and the pre-processing of digitized materials to address specific challenges of applying handwritten text recognition (HTR). Volunteers were recruited for crowdsourcing, and a trial platform was tested with a focus group. The second phase involved conducting a series of hackathons with volunteers, which resulted in a corpus of transliterated and peer-edited material. This material came to constitute the dataset that served as the foundation for training methods in handwritten text recognition tailored to the unique aspects of stenography. The third phase focused on the dissemination of results and the completion of the literary research. This phase also laid the groundwork for a future scholarly edition of Lindgren’s original manuscripts.
The project’s timeline was extended one year beyond the extra year allocated, due to Malin Nauwerck’s parental leave. Karolina Andersdotter, initially a librarian at Uppsala University Library (now PhD student in library and information science at Åbo Akademi), has taken on an active role in the project and co-authored several scientific publications. Anders Hast’s work in the project has to a high degree involved the supervision of PhD candidate Raphaela Heil whose doctoral dissertation Document Image Processing for Handwritten Text Recognition: Deep Learning-based Transliteration of Astrid Lindgren’s Stenographic Manuscripts was published in 2023.
While the COVID-19 pandemic initially restricted physical travel and research opportunities, it accelerated digital crowdsourcing efforts. A significant portion of the project’s volunteers were in the “70+” age group, who were encouraged to remain at home during the pandemic. Feedback from a survey indicated that approximately half of the most active volunteers perceived that their ability to work with the manuscripts was increased due to the pandemic, and the crowdsourcing community as well as the transliteration work were described as “a ray of light in a dull time.”
Central Results and Conclusions
Digitization, Manual Transliterating, and Future Research on Lindgren’s Stenographed Original Manuscripts
The project has successfully digitized and transliterated 55 of Lindgren’s shorthand notebooks held in the Astrid Lindgren archive (L230) at the National Library of Sweden, alongside an additional 8 notebooks in the collections of Swedish Institute for Children’s Books and 3 owned by Astrid Lindgren AB. This work has dispelled the popular myth of Lindgren’s stenography as unreadable, demonstrating that individuals with knowledge in the Melin shorthand system can transliterate the material at a character level.
Today stenography is an endangered skill, with a diminishing number of practitioners. The majority of volunteers who were involved in the transliteration efforts are women born between 1930 and 1970, often with backgrounds as secretaries or stenography instructors. A key factor for their participation has been the ability to work from home since the transliteration process is demanding and often requires external aids. In 2020, the National Library’s Manuscript Department imposed a photography ban on L230:5. Consequently, researchers wishing to study Lindgren’s stenographed original manuscripts must read them in the National Library’s special reading room, where they cannot copy or photograph the materials, even for personal use. This restriction severely limits the ability to assess and manually transliterate material that is not yet digitized. There are compelling reasons to digitize the entire collection of Lindgren’s preserved shorthand notebooks (a total of 670). However, until digitization is fully realized, it is this project’s conclusion that the National Library’s reason for photography ban needs to be carefully balanced against the need for accessibility and research opportunities.
Digital Method Development in Crowdsourcing and HTR
The project utilized a mixed-methods approach for transliterating Lindgren’s original manuscripts, combining manual transliteration through crowdsourcing and organized hackathons with HTR techniques. Manual transliteration through crowdsourcing emerged as a surprisingly effective strategy, yielding numerous positive outcomes for both the volunteers and the project. Not least because it significantly enhanced public interest and dissemination of the project’s findings. With the development of a crowdsourcing method, the project has contributed with an intersectional perspective on motivation, highlighting age and gender as well as the importance of specific expertise and the emotional connection crowdsourcing volunteers felt toward the research task and the material.
Combined with the digitized and preprocessed manuscripts, the transliterations produced through crowdsurcing formed the basis for the first openly accessible dataset (LION) for HTR training on stenographed material. Computerized image analysis methods were developed to address the challenges posed by crossed-out text. Through the LION dataset, the project established a baseline for HTR of Swedish stenography. Based on this foundation, advanced methods for digitally representing stenographic transliterations during HTR training were explored. The proposed methods yielded significant improvements in recognition performance when combined with pre-training. The establishment of an HTR baseline also serves as foundation for future work. One avenue for future research is for example the integration of crowdsourced, human feedback, as well as stenographic knowledge and experience, into the HTR training pipeline.
Contribution to Lindgren scholarship
The project has demonstrated how stenography functioned as the engine in Lindgren’s creative process, and what role it played in the interplay between the various production roles that Lindgren assumed. The project has shown how Lindgren through self-presentation and metafiction contributed to the myth-making surrounding her shorthand manuscripts. Combined with her distinctive position in the field of children’s literature as author-publisher and her assuming of several different roles in the production process of her own fiction, this myth has been essential in the understanding of Lindgren as intuitive, autonomous, and sovereign in her creative process. Another key finding is the identification of a connection between stenography as a writing method and Lindgren’s literary style, termed as a “stenographic effect”. The consideration of this effect enhances our understanding of Lindgren’s work within the context of children’s literary modernism as well as her significant influence on transmediation and adaptation of children’s literature in the latter half of the 20th century.
New Research Questions
Key questions for the future include how to sustainably make the content of Lindgren’s remaining original shorthand manuscripts accessible as well as how digital transliteration techniques can be further developed to compensate for the diminishing number of individuals proficient in stenography. The positive outcomes and high levels of engagement linked to citizen research and crowdsourcing in the project suggest that it is feasible to expand these elements in connection with Lindgren's authorship for a wider audience.
The material digitized and transliterated within this project, particularly the original manuscripts of The Brothers Lionheart, offers rich potential for further research. In the project’s concluding phase, work has begun on a scholarly edition of these manuscripts. Given the novel’s centrality in the children’s literature canon and the sustained public interest in Lindgren’s legacy, this edition holds promise for fostering new avenues for public engagement with cultural heritage.
Dissemination of results
The research findings have been widely shared within the academic community through a combination of co-authored and individual publications, as well as through participation in national and international conferences and guest visits to various institutions. The project has established collaborations with numerous organizations, including the National Library of Sweden, Astrid Lindgren AB, the Melinska Stenografförbundet, the Astrid Lindgren Society, and Astrid Lindgren’s Näs.
Public lectures and presentations have been integral to the project from its inception, continuing throughout its duration. Notable events have included lectures for the language unit at the Swedish Parliament, contributions to Humtank’s events on citizen science during Almedalsveckan, participation in the National Library’s lecture series, and involvement in programs organized by the Swedish Institute for Children’s Books.
From an early stage, the project garnered substantial media attention. Nationally, it has achieved over 400 mentions in the media service Retriever, while internationally, it has been featured in several prominent media channels such as German Die Zeit and Spanish El País. This attention proved advantageous for recruiting and motivating stenographers as volunteers for crowdsourcing activities, but also for disseminating results, as well as fostering new international partnerships. The Swedish Academy’s Bernadotte Scholarship in 2020 also made it possible to explore a collaboration with Astrid Lindgren’s childhood home, Näs, and to provide a contribution to the museum’s permanent exhibition.