Domain-Aware Intermediate Pretraining for Dementia Detection with Limited Data

Citation

Zhu, Youxiang; Liang, Xiaohui; Batsis, John A.; & Roth, Robert M. (2022). Domain-Aware Intermediate Pretraining for Dementia Detection with Limited Data. Interspeech, 2022, 2183-2187. PMCID: PMC10102977

Abstract

Detecting dementia using human speech is promising but faces a limited data challenge. While recent research has shown general pretrained models (e.g., BERT) can be applied to improve dementia detection, the pretrained model can hardly be fine-tuned with the available small dementia dataset as that would raise the overfitting problem. In this paper, we propose a domain-aware intermediate pretraining to enable a pretraining process using a domain-similar dataset that is selected by incorporating the knowledge from the dementia dataset. Specifically, we use pseudo-perplexity to find an effective pretraining dataset, and then propose dataset-level and sample-level domain-aware intermediate pretraining techniques. We further employ information units (IU) from previous dementia research and define an IU-pseudo-perplexity to reduce calculation complexity. We confirm the effectiveness of perplexity by showing a strong correlation between perplexity and accuracy using 9 datasets and models from the GLUE benchmark. We show that our domain-aware intermediate pretraining improves detection accuracy in almost all cases. Our results suggested that the difference in text-based perplexity values between patients with Alzheimer's Disease (AD) and Healthy Control (HC) is still small, and the perplexity incorporating acoustic features (e.g., pause) may make the pretraining more effective.

URL

http://dx.doi.org/10.21437/Interspeech.2022-10862

Reference Type

Journal Article

Year Published

2022

Journal Title

Interspeech

Author(s)

Zhu, Youxiang
Liang, Xiaohui
Batsis, John A.
Roth, Robert M.

Article Type

Regular

PMCID

PMC10102977

ORCiD

Batis - 0000-0002-2823-6651