The machines are coming!04 April 2017
By: Nay Aung
With the explosive adoption of smart devices and Internet of Things(1), there is an exponential increment in the amount of data collected from all walks of life. This new-found wealth in information creates both opportunities and challenges. Machine learning techniques - also known as algorithm- or model-based prediction - have been widely used to capitalise on the deluge of data collected from the internet. Some may be aware that machine learning algorithms are being deployed for tasks such as spam email filtering and Netflix recommendation. But many would be surprised to learn that the faith in the technology has evolved to an extent that some countries are now using the predictive tools created from data mining to assist the criminal justice system such as identification of repeat offenders. Within the fields of medicine, widespread availability of omics technology – genomics, transcriptomics, proteomics, etc. – and detailed imaging techniques generate a huge amount of data which can be exploited by the bioinformatics and machine learning approaches.
Machine learning can be broadly categorised into supervised and unsupervised methods. A pre-requisite for any machine learning technique is to have a set of inputs called features which may be patient characteristics such as age or radiological features such as signal intensity. In supervised learning, the predictive algorithm is constructed (or trained) using the input features and the desired results known as labels (e.g., disease status). Once the algorithm has been adequately trained, it is tested using a new dataset (test dataset with hidden labels) and the performance is assessed by comparing its prediction with the hidden labels. The simplest and commonly used supervised algorithm is linear regression. Alternative algorithms include classification techniques such as decision tree or a more powerful random forest (aggregated decision trees). The unsupervised learning is usually used for datasets without labels (e.g., signal intensity of cancerous tissue on CT which has unknown biological significance, hence, unlabelled). The predominant technique for unsupervised learning is clustering where features are grouped together based on their self-similarity. An example use of these clusters (or groups) is to identify responders vs non-responders in chemotherapy(2).
A recently published paper by Narula and colleagues(3) in Journal of the American College of Cardiology described an application of machine-learning algorithms on echocardiogram parameters to differentiate physiological remodelling of athlete’s heart (ATH) from hypertrophic cardiomyopathy (HCM). They included 139 individuals (77 ATH and 62 HCM) who had undergone full 2-dimensional echography assessment including longitudinal and radial strain measurements using speckle tracking technique. Echo-derived traditional geometric variables such as LV diameter, volume and a multitude of mechanical variables including velocity and strain rate (a total of 120 variables) were assessed for their relative importance by using information gain criterion – a mathematical technique of ranking variables for their individual contribution in the best model fitting. Three distinct machine learning algorithms (artificial neural networks, random forest and support vector machine) were applied to the full echo-derived dataset and the best ensemble model was identified by the majority-voting scheme. In their final combined model adjusted for age, the sensitivity and specificity for the correct diagnosis were 96% and 77% respectively. This is a marked improvement from traditional echo markers such as E/A ratio, E’ velocity and longitudinal strain on their own.
This proof-of-concept study highlighted the role of mathematical model-based approaches to fully utilise the data generated from the newer imaging techniques such as strain measurements and parametric mapping systems in cardiac magnetic resonance. While a true single biomarker remains elusive for most cardiovascular conditions, dimensionality reduction techniques in machine learning can provide clinicians with alternative and complementary information to aid diagnosis, risk-stratification and prediction of treatment response.
1. Gil D, Ferrández A, Mora-Mora H, Peral J. Internet of Things: A Review of Surveys Based on Context Aware Intelligent Services. Sensors [Internet]. 2016 Jul 11 [cited 2017 Feb 24];16(7). Available from: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4970116/
2. Mani S, Chen Y, Li X, Arlinghaus L, Chakravarthy AB, Abramson V, et al. Machine learning for predicting the response of breast cancer to neoadjuvant chemotherapy. J Am Med Inform Assoc JAMIA. 2013 Jul;20(4):688–95.
3. Narula S, Shameer K, Omar AMS, Dudley JT, Sengupta PP. Machine-Learning Algorithms to Automate Morphological and Functional Assessments in 2D Echocardiography. J Am Coll Cardiol. 2016 Nov 29;68(21):2287–95.
Number of hits: 221 Add Comments