Speech as an Emotional Load Biomarker in Clinical Applications

Autores

  • Luís Coelho INESC TEC – Instituto de Sistemas e Computadores and ISEP / P.Porto – Instituto Superior de Engenharia do Porto / Instituto Politécnico do Porto, Porto, Portugal https://orcid.org/0000-0002-5673-7306

DOI:

https://doi.org/10.24950/rspmi.2587

Palavras-chave:

Biomarkers, Emotions, Machine Learning, Speech

Resumo

Introduction: Healthcare professionals often contend with significant emotional burdens in their work, including the impact of negative emotions, such as stress and anxiety, which can have profound consequences on immediate and long-term healthcare delivery. In this paper a stress estimation algorithm is proposed based on the classification of negative valence emotions in speech recordings.

Methods: An end-to-end machine learning pipeline is proposed. Two distinct decision models are considered, VGG-16 and SqueezeNet, while sharing a common constant Q power spectrogram input for acoustic representation. The system is trained and evaluated using the RAVDESS and TESS emotional speech datasets.

Results: The system was evaluated for individual emotion
classification (multiclass problem) and also for negative and
neutral or positive emotion classification (binary problem). The results achieved are comparable to previously reported systems, with the SqueezeNet model offering a significantly smaller footprint, enabling versatile applications. Further exploration of the model's parameter space holds promise for enhanced performance.

Conclusion: The proposed system can constitute a feasible
approach for the estimation of a low-cost non-invasive biomarker for negative emotions. This allows to raise alerts and develop mitigating actions to the burden of negative emotions, being an additional management tool for healthcare services that allows to maintain quality and maximize availability.

Downloads

Não há dados estatísticos.

Referências

Cohen S, Kamarck T, Mermelstein R. Perceived Stress Scale [Internet]. Chicago: APA Psyc Tests; 1983 [cited 2023 Sep 19]. Available from: https://psycnet.apa.org/doiLanding?doi: 10.1037%2Ft02889-000

Loera B, Converso D, Viotti S. Evaluating the Psychometric Properties of the Maslach Burnout Inventory-Human Services Survey (MBI-HSS) among Italian Nurses: How Many Factors Must a Researcher Consider? PLoS One. 2014;9:e114987. doi: 10.1371/journal.pone.0114987.

Coelho L, Reis S, Moreira C, Cardoso H, Sequeira M, Coelho R. Benchmarking Computer-Vision Based Facial Emotion Classification Algorithms While Wearing Surgical Masks. Engineering Proceedings. 2023 (in press).

Vieira FMP, Ferreira MA, Dias D, Cunha JPS. VitalSticker: A novel multimodal physiological wearable patch device for health monitoring. In: 2023 IEEE 7th Portuguese Meeting on Bioengineering (ENBENG). 2023. p. 100–3.

Deepa P, Khilar R. Speech technology in healthcare. Measurement. Sensors. 2022;24:100565.

Vigo I, Coelho L, Reis S. Speech- and Language-Based Classification of Alzheimer’s Disease: A Systematic Review. Bioengineering. 2022;9:27.

Vieira H, Costa N, Sousa T, Reis S, Coelho L. Voice-Based classification of amyotrophic lateral sclerosis: where are we and where are we going? A systematic review. Neurodegener Dis. 2019;19:163-70. doi: 10.1159/000506259

Braga D, Madureira AM, Coelho L, Abraham A. Neurodegenerative Diseases Detection Through Voice Analysis. In: Abraham A, Muhuri PKr, Muda AK, Gandhi N, editors. Hybrid Intelligent Systems. Cham: Springer International Publishing; 2018. p. 213–23.

Lindquist KA. Emotions Emerge from More Basic Psychological Ingredients: A Modern Psychological Constructionist Model. Emotion Rev. 2013;5:356–68.

Livingstone SR, Russo FA. The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PLoS One. 2018;13:e0196391. doi: 10.1371/journal.pone.0196391.

Pichora-Fuller MK, Dupuis K. Toronto emotional speech set (TESS) [Internet]. Borealis; 2020. [cited 2023 Sep 19].Available from: https://borealisdata. ca/citation?persistentId=doi:10.5683/SP2/E8H2MF

McFee B, Raffel C, Liang D, Ellis DPW, McVicar M, Battenberg E, et al. librosa: Audio and Music Signal Analysis in Python. Proceedings of the 14th Python in Science Conference. 2015;18–24.

Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library [Internet]. arXiv; 2019 [cited 2023 Sep 26]. Available from: http://arxiv. org/abs/1912.01703

Boersma P, Weenink D. Praat: doing phonetics by computer [Internet]. 2018. [cited 2023 Sep 19]. Available from: http://www.praat.org

Eyben F, Scherer KR, Schuller BW, Sundberg J, André E, Busso C, et al. The Geneva Minimalistic Acoustic Parameter Set (GeMAPS) for Voice Research and Affective Computing. IEEE Trans Affective Comput. 2016 Apr;7(2):190–202.

Eyben F, Wöllmer M, Schuller B. openSMILE -- The Munich Versatile and Fast Open-Source Audio Feature Extractor. MM’10 - Proceedings of the ACM Multimedia 2010 International Conference. 2010. 1459 p.

Cabral JP, Oliveira LC. Emovoice: a system to generate emotions in speech. In: Interspeech 2006 [Internet]. ISCA; 2006 [cited 2023 Sep 26]. p. paper 1645-Wed2BuP.3-0. Available from: https://www.isca-speech.org/archive/interspeech_2006/cabral06_interspeech.html

Simonyan K, Zisserman A. Very Deep Convolutional Networks for Large-Scale Image Recognition [Internet]. arXiv; 2015 [cited 2023 Sep 26]. Available from: http://arxiv.org/abs/1409.1556

de Lope J, Graña M. An ongoing review of speech emotion recognition. Neurocomputing. 2023;528:1–11.

Iandola FN, Han S, Moskewicz MW, Ashraf K, Dally WJ, Keutzer K. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size [Internet]. arXiv; 2016 [cited 2023 Sep 26]. Available from: http://arxiv.org/abs/1602.07360

Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L. ImageNet: A large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition [Internet]. 2009 [cited 2023 Sep 26]. p. 248–55. [cited 2023 Sep 26] Available from: https://ieeexplore.ieee.org/document/5206848

Downloads

Publicado

17-05-2024

Como Citar

1.
Coelho L. Speech as an Emotional Load Biomarker in Clinical Applications. RPMI [Internet]. 17 de Maio de 2024 [citado 22 de Dezembro de 2024];31(1 - Edição Especial):7-13. Disponível em: https://revista.spmi.pt/index.php/rpmi/article/view/2587