linguistics methods

Computational linguistics methods to enhance process evaluations of cardiovascular interventions


Achieving the UN Sustainable Development Goal 3.4, requires complex evidence-based interventions to be implemented sustainably in local settings. However, effective interventions are often not fully implemented, and some not at all, due to factors that could have been identified during process evaluations.

Process evaluations can help to design interventions, optimise implementation, and inform sustainability and scale but are also time and resource intensive. Thus, most process evaluations are commonly performed at or near study end, and findings are not rapidly fed back to end-users. Computational linguistics could be a potential solution to enhance the process evaluation in a timely and valid manner. Computational linguistics uses advanced text mining and machine learning to discover linguistic patterns in natural language and link these patterns to process evaluation domains such as implementation measures of fidelity, acceptability, and appropriateness.


This study aims to use existing datasets to test the reliability and validity of computational linguistics on four completed qualitative datasets to train the algorithm to better recognise patterns in the text, then developing an integrative platform for future research.

Research Methodology

This project is a feasibility study of a novel analytic process using computational linguistics. The study involves the use of secondary use of existing datasets to test the reliability and validity of computational linguistics on four completed and established qualitative datasets.

In this study, audio files and transcripts from process evaluation interviews of four datasets (about salt reduction, polypill and digital health across contexts) will be analysed retrospectively. The computational linguistics software generates codes will be compared to gold standard codes based on traditional qualitative methods by testing the reliability and validity. This process trains the software to better recognise patterns in the text then developing prototypes of a data repository with machine learning to inform an integrative platform with end-users to enhance qualitative analysis.

Current Status

The study has commenced its proof of concept phase, with initial testing on a subset of the qualitative data. Further development and testing of the extended datasets as proposed, with the development of a co-designed platform to be completed by 2026.