Dimitrios Kokkinakis

Linguistic and extra-linguistic parameters for early detection of cognitive impairment

With an increasing aging pyramid the number of people with cognitive dysfunctions, such as various types of dementia, has grown at a high rate. However, years before the clinical onset symptoms of dementia, patients exhibit serious deficits in their oral and written communication and visual short-term memory, signs that can be measured and serve as a complement to medical evidence to discriminate the performance of healthy (elderly) controls or even predict poor cognitive health in late life. The aim of the project is to apply and explore automatic linguistic analysis to language samples produced by persons at various stages of cognitive decline in order to identify important linguistic markers that can be used as a complementary, early diagnostic, prognostic or screening tool. Language, or rather linguistic performance in this context, are various forms of spoken or written language production and comprehension, e.g. transcripts of audio-recorded utterances; accessible language-based interaction through the web or measures from an eye tracking device. A correct and timely diagnosis of neurodegenerative brain disorders, such as Alzheimer's disease, and differentiation of various types of dementia is of great importance to clinicians. The project intends to perform research in the areas of Natural Language Processing (NLP) that will allow us to broaden opportunities for multidisciplinary research activities between researchers from humanities, computer sciences and medicine.
Final report
Purpose of the project and how it has developed during the project period

Purpose: the overall aim of the project was to identify significant linguistic features that can be used as a complementary, early diagnostic, prognostic or screening tool for identifying people with mild forms of cognitive impairment. With increased life expectancy, the prevalence of dementia increases. Years before the clinical onset symptoms of dementia, patients may exhibit deficits that affect cognitive abilities such as memory, language and executive functions. The symptoms usually creep in and gradually worsen over the course of many years. Language deficits appear early in the course of the disease and it is important to gain new knowledge about the subtle signs that might lead to more severe forms of cognitive impairment, e.g. Alzheimer’s. Moreover, people with mild cognitive symptoms are becoming more common in healthcare. It is currently difficult to distinguish which of these people are eligible for a referral to a specialist in a memory clinic, which in turn requires a costly and time-consuming assessment process (e.g., brain imaging, cerebrospinal fluid and neuropsychological testing).
Development: neuropsychological examinations consist of e.g. language tests that are not usually digitally recorded. In the project, we recorded and analyzed speech (incl. orthographic transcripts) and eye movements during text reading produced by people with mild forms of cognitive impairment and healthy controls. By extracting and modeling variables from these three modalities, we were able to detect differences on how participants read a text or pronounced certain sounds. With the models we built, we were able to rapidly and objectively distinguish between these groups. The project developed as planned without major changes in its aims, research questions or theory. Data collection was delayed a couple of times due to unpredictable events. Finally, a sad event in Nov. 2017, affected all project members when the project's domain expert passed away. A new recruitment process started immediately and a new, suitable domain expert could be recruited a couple of months later.

A short description of how it was implemented

At the beginning of the project, we sought and obtained approval from the ethics review board. We worked in parallel to design the tests we were to carry out and organized a two-day workshop with invited experts to discuss data collection and methodology. Participant recruitment began in the summer of 2016; while recording phase-1 took place in the autumn of 2016-spring 2017 and recording phase-2 in 2018. Continuously, project members published scientific papers on the project's methodology, data and results. Project-related topics were discussed in seminars and symposia. We organized annual and monthly meetings with agendas and minutes as well as four international workshops.
Participants, empirical data & methodology: the project’s participants were recruited from an ongoing epidemiological study, the "Gothenburg MCI study". This implies that we were confident on which group the participants belonged to and gave us the opportunity to have access to their demographic and neuropsychological data. The participants consisted of a group with ‘Subjective Cognitive Impairment’ (SCI); a group with ‘Mild Cognitive Impairment’ (MCI) and cognitively healthy controls. By digitally recording all language tests and registering eye movements while reading, we could extract variables and then build, compare and evaluate classifiers that could learn to differentiate the groups. These models used Artificial Intelligence (AI) techniques, namely machine learning and deep learning algorithms.
Experiments:
i. eye tracking: to measure how effectively the cognitive processes of reading work together, we used eye tracking. The participants read both silently and aloud two texts on the computer. Silent reading gave better classification results than reading aloud, but even better results were obtained with variable combination from both modes [cf. 12].
ii. speech: the properties of spoken language have direct links to cognitive abilities. E.g., articulation, voice strength, and prosody can reveal pathological, subtle abnormalities. Through the extraction and modeling of sound variables, we built computer models that could differentiate pathological language with great accuracy [cf. 2;4;19;24;27].
iii. transcriptions: all audio recordings were transcribed orthographically that enabled us to apply language technology methods to also measure properties in written text [cf. 13].

The project’s three most important results and contributions to the international research front and a discussion about this

The project's three most important results could be:
i. multilingual combination of comparable data increases the predictive performance of a classifier: lack of suitable data is an obstacle when researching linguistic characteristics of people with cognitive impairment. We found that the classifier’s ability to differentiate between participants with or without MCI was improved by adding comparable data from another language. This opens up new research directions as the lack of data in smaller languages could be compensated by supplementing it with data from another language in a comparable task [cf. 1].
ii. linguistic variables improve the diagnostic precision of screening tools: increasing knowledge of different linguistic characteristics is a powerful analytical tool for capturing early signs of cognitive impairment and thereby increase the diagnostic value of screening tools. In various experiments, we were able to improve the diagnostic precision of such an instrument [cf. 16;35].
iii. eye-tracking as a window into our cognitive abilities: the coordination between speech and eye movements is affected in MCI. When reading aloud, the eye is always slightly ahead of the speech. This is required for the brain to have time to process the read word and initiate the speech process - the words are kept in short-term memory. Our analysis showed that people with MCI have a shorter time interval between the fixation of a word until it is pronounced, compared with controls, which could be a strategy for dealing with shortcomings in short-term memory [cf. 15]. Using eye-tracking measurements, we were also able to distinguish control subjects from people with mild cognitive difficulties with high accuracy by combining variables from both silent and aloud reading [cf. 12]. This research shows that eye-tracking measurement is a promising method for detecting the earliest stages of cognitive impairment.

New research questions generated through the project

New research questions have arisen that can be developed in the near future. We would like to do research on the macro-linguistic level of language, which means examining how the participants keep a common thread in a conversation and how they structure their verbal communication with a conversation partner. Although language is not the only diagnostic factor in SCI and MCI, it is likely that cognitive impairment gives rise to language deficits at this level. Another area of research we would like to explore is whether the combination of variables from different modalities – linguistic and biomarkers – can improve predictions and outcomes and thus broaden and deepen the knowledge of what characterizes the preclinical stages of dementia. A wealth of biomarkers are currently investigated in the H70-1944 population study at the Sahlgrenska Academy (https://www.gu.se/forskning/h70-studierna-i-goteborg) and we aim to apply for new research grants to investigate these questions. Some of the neuropsychological tests in H70-1944 are already being recorded. The combination with and comparison between linguistic, cognitive and multimodal markers is unique and has never been tested before on such a large scale that can pave the way for exploring previously inaccessible dimensions of pathological language changes and provide additional diagnostic value with linguistic analysis.

The project’s international dimensions, such as contacts and material

Project members have actively participated in conferences and popular science events. Scholars working in related research have been invited to visit us, while project staff have visited research centers to present the project and discuss collaborations. Project members have worked together with researchers from Canada (U. of Toronto); Japan (Aging Research, IBM, Tokyo); Germany (DFKI); France (Hospitalier Universitaire de Nice); USA (School of Medicine, Johns Hopkins U.); Greece (National and Kapodistrian U. of Athens). The project set up a new international workshop series ‘RaPID’ [cf. 55;56;58] which enables researchers in the field to meet, present and discuss data, methods and findings. The project's PI has been a member of the board for the "Center for Aging and Health" (AgeCap) since 2017 (https://agecap.gu.se/) an interdisciplinary center that gave us the opportunity to establish new contacts and participate in discussions and meetings about cognition and aging. Detail information about the project is here: https://spraakbanken.gu.se/forskning/teman/alz-rjx.

How the project team has disseminated the results to other researchers and groups outside the scientific community and discuss and explain how collaboration has taken place

Project members have been active on disseminating the results and the potentials of the research into a much wider public audience:
• public events: the Göteborg Book Fair in 2017&2018; the ‘Science festival’ and the ‘Demensdagen-2018’ [cf. 44;47;48;62]
• major daily newspaper (‘DN’); an exhibition on aging ‘Årsrika’ and a note with the minister of Public Health, Healthcare and Sports ‘Annika Strandhäll’ [cf. 43;46;51]
• popular science, periodicals as well as blog posts: e.g. ‘Språktidningen’, ‘Äldre i Centrum’ and ‘Språkbruk’ [cf. 42;45;49-54].
Grant administrator
University of Gothenburg
Reference number
NHS14-1761:1
Amount
SEK 10,460,000
Funding
New prospects for humanities and social sciences
Subject
Language Technology (Computational Linguistics)
Year
2015