A Short Introduction to Machine Learning in Medicine
In this article series, I will discuss machine learning and its applications in medicine with practical examples. In this first part, I’ll make a short introduction to machine learning and talk about our first topic: regression.
I will be using PyCharm or XCode IDE’s for the demonstrations. Python is a very common programming language for obvious reasons, but I also prefer the Swift language in some scenarios. So I will use Swift and talk about CoreML too.
As can be understood from the title, this will not be a series of articles on machine learning only; I will also discuss machine learning applications in medicine.
Comments are very welcome, so if you have a suggestion, please let me know using the comments section. Or you can send me an email too.
Let’s start.
Will AI Take My Job?
For the last five years, AI has changed too many things, and it has already eliminated the need for humans in several tasks. So as in several other fields, doctors and other medical specialists are also anxious about the future of their jobs. I’ve been asked this question several times, not only by doctors but also by students from several medical schools: Will AI take my job.
AI changed too many things in medicine too, but I assume there is a misconception. AI will not replace or reduce the need for doctors. Doctors will not be going anywhere soon. But doctors who do not learn how to use AI in medicine will probably have some problems.
Here is an example;
I implemented an image segmentation algorithm as part of my dissertation. We could scan 32 cranial MRIs within a minute and identify those with a lesion using that algorithm. The algorithm also performs segmentation and highlights the tumors. The overall accuracy of the algorithm was around 92%, and it was also able to detect multiple lesions within a frame. The multiple lesion identification of the algorithm was higher than 98%.
For those who are not aware of the crazy workload of the radiology department, 92% accuracy is highly acceptable for two reasons.
- The radiology department is hectic, and using an algorithm like this will help us prioritize urgent and life-threatening cases like brain aneurysms.
- A radiologist can never review 32 MRI scans within a minute.
Does this mean that we can eliminate the entire radiology department? Obviously no. Instead, this algorithm is a tool for radiologists. And a tool like this can be used for several purposes.
- It can help us to prioritize urgent cases and assist radiologists in identifying lesions faster.
- It can be used to validate the diagnosis to eliminate any misdiagnosis.
- It can be used to train new radiologists.
We still have too much way to replace the experience of a senior radiology specialist with an algorithm. Machine learning algorithms are great in automating some tasks or performing a preliminary diagnosis, but we are just ready for that when it comes to a real case. Even we can reach an accuracy of 99.9%; there will be ethical boundaries. Here’s a scenario. Imagine that the algorithm diagnosed the patient with GBM (the deadliest and most aggressive brain tumor type in adults). Just 10 minutes later, the radiologist steps in and say that “Oh, I am sorry, our AI has misdiagnosed, there is a tumor, but it looks like a Meningioma (a mostly being brain tumor. only requires surgical dissection if the patient has severe symptoms), not a Glioblastoma”. Now, let’s emphasize with the patient. For that 10 minutes, the patient thought that he/she has less than 12 months of life. What do you think our patient might have thought and felt for that 10 minutes, felt like ages by the patient? In medicine, there is no room for an error like this.
So as you can see, radiologists will be with us for a long time. But the ones that can use AI will be with us much longer.
Because of the image processing capabilities of our current technology, Radiology seems like one of the best fields for AI applications, but there are several examples almost in all specialties varying from dermatology to ophthalmology.
So, healthcare is a unique field for AI applications, and it is probably one of the hardest ones. The purpose of this article series is to help colleagues and medical students understand and learn machine learning. Obviously, you are not required to be a physician or medical student to read these articles, but some examples will be understandable by them easier.
Regression
First, let’s make it clear. What is regression? Why do we need it? How do we use it?
In the simplest form, we can define regression as the process of estimating the relationship between one dependent variable and one or more independent variables.
Now, let’s explain this with a real-life example.
Heart Rate Recovery
Here is our first medical term for this episode. Heart rate recovery can be described as decreasing heart rate at 1 minute after cessation of the exercise. It is known as a significant indicator of cardiovascular health. Therefore, a lot of research has been done and is being done on this subject.
We perform a stress ECG test on the patient, and we monitor the heart rate changes for the next 1 minute after the cessation of the test and compare the recovery to the patient’s resting heart rate.
We expect the patient’s pulse to return to the resting state within a certain period. And we know where the patient’s pulse should come in 1 minute according to the average heart rate at rest and the heart rate values seen during the test. The recovery rate value we measure in line with these parameters is an important indicator of the patient’s heart health.
A study published in the Journal of the American Heart Association evaluated the prognostic value of heart rate recovery.
In the study, researchers investigated the prognostic value of heart rate recovery at 10, 20, 30, 40, and 50 seconds after the cessation of the exercise in 40.727 selected UK biobank participants. The mean age of the participants was 56, and 45% of the participants were male. During a mean follow-up period of 6 years, 536 participants died (including 39 coronary artery disease).
After the multivariable analyses, including adjustments for aerobic exercise capacity, cardiovascular risk factors, and factors associated with mortality in general, the researchers identified HRR at 10 seconds as a predictive indicator of both all‐cause and coronary artery disease mortality.
The study also showed that the effects of HRR were larger and more significant when measured early after exercise cessation.
Moreover, the association of change in heart rate between 10 seconds and 1 minute after exercise cessation with mortality was dependent on HRR at 10 seconds.
As you can see from this example, there is a relationship between the heart rate change and other parameters in this scenario. And researchers used regression analysis to identify those relationships. Moreover, they also identified the irrelevant variables and focused on the most effective ones.
If you are interested, you can read the publication here.
Why do we use Regression?
We use the regression analysis for two reasons: to predict or forecast a value or to infer the causal relationships between the independent and dependent variables. Before explaining the independent and dependent variables, let’s clarify the difference between prediction and forecasting.
In statistics, a prediction or forecast is a statement about a future event. They are often, but not always, based upon experience or knowledge. Forecasting is making predictions based on past and present data and most commonly by analysis of trends. In machine learning, you’ll see that these two words are used interchangeably.
So, we talked about regression analysis, but we did not explain what dependent and independent variables are.
Let’s clarify them too. Let’s assume that we are running an experiment with the patients that will undergo neurosurgery.
The independent variable is the variable that the experimenter manipulates or changes and is assumed to affect the dependent variable directly. For example, we allocate participants to either drug or placebo conditions (this is our independent variable) to measure any changes in the intensity of their anxiety (which is our dependent variable).
The dependent variable is the variable being tested and measured in an experiment and is dependent on the independent variable. An example of a dependent variable is depression symptoms, which depend on the independent variable, which is therapy.
Before making a list of regression models, let’s clarify one more thing. There are some linear and nonlinear regression models.
Linear regression requires a linear model. And a model is linear when each term is either a constant or the product of a parameter and a predictor variable. A linear equation is constructed by adding the results for each term. This constrains the equation to just one basic form:
In statistics, a regression equation (or function) is linear when it is linear in the parameters. While the equation must be linear in the parameters, you can transform the predictor variables in ways that produce curvature.
For instance, you can include a squared variable to produce a U-shaped curve. This model is still linear in the parameters even though the predictor variable is squared. You can also use log and inverse functional forms that are linear in the parameters to produce different curves.
Here is an example of a linear regression model that uses a squared term to fit the curved relationship between BMI and body fat percentage.
While a linear equation has one basic form, nonlinear equations can take many different forms. The easiest way to determine whether an equation is nonlinear is to focus on the term “nonlinear” itself. It’s not linear if the equation doesn’t meet the criteria defined in the previous slide for a linear equation. That covers many different forms, which is why nonlinear regression provides the most flexible curve-fitting functionality.
There are several regression models. But in this series, I’ll cover the most commonly used ones. So here are the six regression models I’ll talk about in this video series.
- Simple Linear Regression
- Multiple Linear Regression
- Polynomial Regression
- Support Vector for Regression (SVR)
- Decision Tree Classification
- Random Forest Classification
In the following episode, we’ll talk about simple linear regression and its applications. If you think that this introduction is helpful, you can like it and share it with your friends. Also, if you have a suggestion, please write it in the comments section.
Thanks for reading.