Do Analytics Discriminate? Disparities in Algorithms Across Various Racial and Ethnic Groups

Do Analytics Discriminate? Disparities in Algorithms Across Various Racial and Ethnic Groups

Do Analytics Discriminate? Disparities in Algorithms Across Various Racial and Ethnic Groups

Authored by Ayesha Rajan, Research Analyst at Altheia Predictive Health

Introduction

Patients who come in with concerns about kidney function are always tested for their glomerular filtration rate(GFR). It is interesting to note, the GFR is factored into the Kidney Failure Risk Equation and other algorithms differently for African Americans but remains standard across all other ethnic and racial minority groups. However, across several other predictive health algorithms and risk calculators, racial differences in the metabolic panel are typically not considered. Given that different racial and ethnic groups have different predispositions to and progressions of diseases, this is an area that needs improvement. As the field of medicine moves towards analytics to predict patient risk, medication needs and other factors, it is important to consider the benefits of increased accuracy by taking race into consideration when creating algorithms, as well as the implications of biased algorithms.

Current Problem

In 2019, NBC reported on an algorithm written by Optum that was heavily biased. The algorithm was used to rank patients that would most benefit from a care program that aimed to manage responsible use of medication and to keep patients out of the hospital. In a group of 6,000 black patients and 44,000 white patients, only 17.5% of people recommended for the program were black despite the fact that the black patients had 26.3% more chronic conditions. Developers had to ask themselves why this was the case and found that their cost-based algorithm left behind black patients because of monetary biases – black patients spend $1,800 less per year than white patients with the same conditions. This means that black patients with the same or worse chronic conditions than white patients were left behind because of their inability to pay. Once Optum balanced that aspect of the algorithm, they saw the number of black patients recommended for the program jump from 17.5% to 46.5% (Gawronski). This example shows disparities in algorithms that we can not always anticipate but that still negatively affect certain racial groups.

Analytics As A Solution

In October of 2019, George Washington University received a grant to study these types of disparities. The four-year study is still in its earliest stages, but it will help researchers better identify the relationships between race and disease prediction. Yan Ma, who is the Vice Chair of the Department of Bioinformatics at George Washington University, says that one of the biggest and easiest steps that can be taken to improve this situation is for large databases to include a patient’s ethnic group and/or race in their information. In fact, it’s surprising that many data bases do not already have this data.  If they did, it would make machine learning and artificial intelligence even more powerful tools than they are currently (Kent).

The Centers for Medicare & Medicaid Services echo the same sentiment in their studies of using racial data to improve healthcare treatment for all patients, citing lack of clear and reliable data as the biggest roadblock to their research and the area most in need of improvement. Currently, Medicare uses (and has been using for quite some time) a method called geocoding to target at risk communities. Geocoding makes predictions about a person’s health based on the characteristics of the areas in which they live. It works in the way that many would like algorithms to account for racial differences;  it takes into consideration the population health and trends of a community to make more targeted predictions about a person’s health. This method is limited because it is not exact and is highly dependent on areas that self-segregate. It does however lay out the groundwork for how a study could separate certain groups to identify their specific risk factors and general health.

Conclusion

While data science tools such as machine learning and artificial intelligence have significantly advanced the field of medicine, there are many people that are left behind because of biased equations. In order to best serve all communities it is important that healthcare providers and companies that are creating predictive algorithms take differences in racial predispositions into account and adjust metabolic and blood panels in the same way to provide the best care to each unique patient.

Works Cited

Gawronski, Quinn. “Racial Bias Found in Widely Used Health Care Algorithm.” NBCNews.com, NBCUniversal News Group, 7 Nov. 2019, www.nbcnews.com/news/nbcblk/racial-bias-found-widely-used-health-care-algorithm-n1076436.

Kent, Jessica. “Machine Learning to Uncover Racial Disparities in Healthcare.” HealthITAnalytics, 31 Oct. 2019, healthitanalytics.com/news/machine-learning-to-uncover-racial-disparities-in-healthcare.

Llanos, Karen. “Using Data on Race and Ethnicity to Improve Health Care Quality for Medicaid Beneficiaries.” CHCS, 2006, www.chcs.org/media/Using_Date_to_Reduce_Health_Disparities.pdf.

Quantifying Chronic Kidney Disease

Authored by Ayesha Rajan, Research Analyst at Altheia Predictive Health

Introduction

Chronic Kidney Disease is a condition involving the gradual loss of kidney function; your kidneys filter blood to remove waste and toxins which in turn helps control blood pressure and maintain red blood cell function and bone health so their ability to function properly is clearly very important. Chronic Kidney disease is caused by presence of diabetes, high blood pressure, obstruction of the urinary tract and range of other conditions including glomerulonephritis, interstitial nephritis, polycystic kidney disease, vesicoureteral and pyelonephritis (Mayo Clinic). Many patients do not realize they have Chronic Kidney Disease until it has progressed quite far in the 5 stages but when symptoms do show up, they include itching, muscle cramps, lack of appetite, nausea, unusual swelling, changes in frequency of urination and trouble breathing or sleeping. Once diagnosed, the disease is managed by slowing the progression of kidney damage to prevent end-stage kidney failure which necessitates dialysis or a kidney transplant. Currently, 15% of the American population (37 million people) has Chronic Kidney Disease but many of them do not know it – this is a frightening data point given that 340 people begin dialysis treatment every that and that kidney disease is the 9th leading cause of death in the United States (CDC). Additionally, UC San Francisco has calculated that CKD costs $79 billion dollars for Medicare patients and predicts that 16.7% of the population will contract CKD by 2030 which shows a clear need for further research in this area (Kent).

Key Data Points

Several factors increase a patients’ risk of chronic kidney disease including presence of diabetes or hypertension, heart disease, smoking activity, obesity, race (African Americans, Native Americans and Asian Americans are all higher risk race groups), family history and age (Mayo Clinic). These factors are often factored into predictive algorithms along with blood panels which hold key variables such as Albumin to Creatinine Ration (ACR), Serum Creatinine, Blood Urea Nitrogen (BUN) and Glomerular Filtration Rate (GFR). Urine tests can also measure relevant variables such as Urine Protein, Microalbuminuria and Creatine Clearance Rate. All of these variables measure kidney function and can predict the onset or stage of Chronic Kidney Disease.

Current Research

An important study out of Cairo University utilized multiple algorithms to study the importance of physical variables in class identification of CKD. The study used probabilistic neural networks, multilayer perceptron, support vector machine and radial basis function algorithms to identify which algorithm would most accurately identify a patients’ stage of CKD. The study found that the probabilistic neural network algorithm yielded the highest classification accuracy at 96.7% and used that information to add weight to each considered variable and improve the prediction performance of CKD stage diagnosis. This study showed that each variable was, indeed, not weighted equally. In fact, there was a significant difference between the 100% importance of serum creatinine and a 9.256% importance of hypertension in diagnosis. This is important in identifying at risk groups because, clearly, not everyone with hypertension will have CKD but those at high risk serum creatinine levels are very likely to need treatment (Rady). Research conducted in the United States around CKD draws from the following databases for information: The National Health and Nutrition Examination Survey; United States Renal Data System; Kaiser Permanente; and Veterans Affairs Healthcare System. These databases are essential to the use of artificial intelligence and machine learning techniques because they can provide ranges for many of the physical variables listed above. However, outside of physical variables, research has also been done on nonconventional risk factors of CKD. For example, several studies have evaluated air pollution using “of land-use regression and spatiotemporal models that utilized satellite remote-sensing aerosol optical depth data” to associate air pollution with incidence of CKD in a population. These studies have concluded that increased air pollution could be correlated with incidence of CKD and decrease of glomerular filtration rate. Another study using artificial intelligence used clinical notes to evaluate predictors of CKD and found high-dose ascorbic acid and fast food consumption to be novel predictors (NCBI). Artificial intelligence can actually do most of the heavy lifting in studies like these in which we can gain insight into the impact of factors that we may have never otherwise considered to be relevant in the study of Chronic Kidney Disease.

Conclusion

Chronic Kidney Disease affects (and will continue to affect) a significant number of the population and it is clear that more research needs to be done in this area. To make that possible, some things need to change. For example, accessibility to medical data needs to be made easier so that research can happen at various levels, i.e. medical, academic and corporate. This ensures that those who want to research these topics can do so without the time constraints of existing rules and regulations so that developments can be made mainstream to patients and providers in the timeliest matter. Additionally, federal funding could be redirected to research in this area to improve data processing techniques which are currently fragmented and hinder the success rate of the existing multidimensional algorithms.

Prevention
The necessary steps for preventing Chronic Kidney Disease are very much in line with leading a generally healthy life. Mayo Clinic recommends that one maintain a healthy weight through physical exercise and calorie reduction, not smoke and follow responsible usage guidelines for over-the-counter medications as abusing pain relievers can cause kidney damage. Furthermore, if you are at risk, it is important to check in with your physician frequently to track and manage symptoms of Chronic Kidney Disease (Mayo Clinic). If you are unsure about whether or not you might be at risk of contracting kidney disease, you may consider using the CDC’s Chronic Kidney Disease Risk Calculator at: https://nccd.cdc.gov/CKD/Calculators.aspx#tab-Bang.

 

Works Cited

“Chronic Kidney Disease Basics.” Centers for Disease Control and Prevention, Centers for Disease Control and Prevention, 7 Feb. 2020, www.cdc.gov/kidneydisease/basics.html.

“Chronic Kidney Disease.” Mayo Clinic, Mayo Foundation for Medical Education and Research, 15 Aug. 2019, www.mayoclinic.org/diseases-conditions/chronic-kidney-disease/symptoms-causes/syc-20354521.

Kent, Jessica. “Chronic Kidney Disease Patients Face Significant Care Disparities.” HealthITAnalytics, HealthITAnalytics, 17 July 2019, healthitanalytics.com/news/chronic-kidney-disease-patients-face-significant-care-disparities.

Rady, El-Houssainy A., and Ayman S. Anwar. “Prediction of Kidney Disease Stages Using Data Mining Algorithms.” Informatics in Medicine Unlocked, Elsevier, 7 Apr. 2019, www.sciencedirect.com/science/article/pii/S2352914818302387.

Zeng, Xiao-Xi, et al. “Big Data Research in Chronic Kidney Disease.” Chinese Medical Journal, Medknow Publications & Media Pvt Ltd, 20 Nov. 2018, www.ncbi.nlm.nih.gov/pmc/articles/PMC6247601/.

 

 

What Role Does Analytics Play in Mental Health Research?

What Role Does Analytics Play in Mental Health Research?

What Role Does Analytics Play in Mental Health Research?

Authored by Ayesha Rajan, Research Analyst at Altheia Predictive Health

Introduction

May is Mental Health Awareness Month, an especially important topic this year as we as a society continue to navigate the coronavirus pandemic. Mental illness is incredibly prevalent in the United States with 1 in 5 adults (43.8 million people) experiencing mental illness and 1 in 25 (9.8 million people) experiencing mental illnesses that limit their ability to live a normal life (Coleman). Furthermore, as people across the country and world currently face the struggles of social isolation and job uncertainty, they report significantly higher incidents of negative mental health effects (Panchal). Clearly, there are a great number of people who would benefit from further developments in mental health research. While the study of analytics as it pertains to mental health is a fairly new field, there are many promising reports and developments that we will discuss here.

Key Points

When looking at mental health from a data standpoint, there are several approaches to a mental health study. Unlike the monitoring and analysis of conditions like coronary artery disease or diabetes, many current mental health studies focus on factors outside of the metabolic panel such as tracking a patients’ actions, words, facial expressions and non-verbal cues to make predictions about behavior.

While many of the studies detailed here do not look at physical health as a factor in their research, it is important to know that depression and mental illness is more common amongst those with chronic illnesses such as: cancer, coronary artery disease, diabetes, epilepsy, multiple sclerosis, stroke, Alzheimer’s, HIV/AIDS, Parkinson’s, lupus, and rheumatoid arthritis (NIMH).

Research Studies

The Crisis Text Line is a crisis counseling center that receives text messages from people experiencing mental health instability and those who may be considering self-harm or suicide.  It then connects them to counselors via text message – a form of communication that can be more comfortable than a phone conversation for many people. Crisis Text Line has collected and analyzed the language patterns over 30 million text messages to analyze trends in those who were more likely to self-harm or commit suicide. What they found was a wealth of key words, such as the word “Advil,” that indicated a person’s risk of committing suicide. Interestingly, none of the key words included those that were previously considered high risk (DDS).

Another fascinating study came out of the University of Southern California where researchers created a virtual therapist called “Ellie.” Ellie captures and analyzes facial expressions and non-verbal cues and uses artificial intelligence to learn to detect the presence of mental illness. In the study, Ellie was more effective than a routine health assessment at detecting Post Traumatic Stress Disorder in military personnel returning from tours in Afghanistan (DDS).

Kaiser Permanente has also conducted research in this area. They successfully built an analytics model that predicts the 90-day suicide risk of patients visiting a mental health professional. The model took in behavioral patterns such as prior suicide attempts, substance use, emergency room incidents and a questionnaire, as well as medical and mental health diagnoses and prescribed medication, as variables for the model. The model was able to identify the top 5% of those with the highest risk of committing suicide. It has created a great foundation for tracking and protecting patients with mental health issues (DDS).

Where Can Analytics Take This Field?

Many mental illnesses are the manifestation of both natured and nurtured inputs. Consequently, the study of data science as it relates to mental health will continue to see a synergy of biological and behavioral inputs that are factored into predictive algorithms as variables. We will likely see more studies take on a biostatistical approach for many of the biological factors related to mental illness including those discussed above. Other factors that show promising abilities to predict and track mental illness include neurobiological mechanisms such as biomarkers from brain imaging, neurocognitive task assessment and psychometrics as they relate to biological aging (Wall). Artificial intelligence can do a lot of the heavy lifting in determining which factors, biological or behavioral, carry the most weight in prediction, prevention, and management. Given the fact that artificial intelligence tools are becoming more and more mainstream, we can likely expect to see many exciting developments in this field.

How You Can Look After Your Mental Health During the Pandemic

The World Health Organization has listed the following items as methods to cope with the stress and anxiety surrounding the COVID-19 pandemic:

  1. Stay informed by checking the news once or twice a day
  2. Keep a routine by maintaining your previous routine or creating a new routine
  3. Maintain a healthy lifestyle be eating healthy meals, exercising regularly, getting enough sleep, and maintaining           personal hygiene
  4. Maintain social contact by checking in on and catching up with friends and family
  5. Limit screen time in terms of video games and social media
  6. Limit alcohol and drug use

Making sure that you are checking in with yourself and monitoring your mental health is always important, but it is even more so as we all face the struggles of a pandemic. By taking care of your body and ensuring you have enough time to rest, you can set yourself up to adapt to a trying situation. Additionally, be sure to reach out to love ones and check in on them as well.

Free Tools Available

There are several free tools for mental health available. We have consolidated few resources below to help you navigate to these resources.

Works Cited

Chronic Illness & Mental Health.” National Institute of Mental Health, U.S. Department of Health and Human Services, www.nimh.nih.gov/health/publications/chronic-illness-mental-health/index.shtml.

Coleman, Madeline. “Mental Health and Big Data: A Step in the Right Direction.” RxDataScience Inc. – Data Science for Healthcare, 6 May 2020, www.rxdatascience.com/blog/mental-health-and-big-data-a-step-in-the-right-direction.

Panchal, Nirmita, et al. “The Implications of COVID-19 for Mental Health and Substance Use.” The Henry J. Kaiser Family Foundation, 21 Apr. 2020, www.kff.org/coronavirus-covid-19/issue-brief/the-implications-of-covid-19-for-mental-health-and-substance-use/.

“Using Data Science to Help Tackle Mental Health Issues.” DiscoverDataScience.org, 16 Mar. 2020, www.discoverdatascience.org/social-good/mental-health/.

Wall, Melanie. “Mental Health Data Science.” Columbia University Department of Psychiatry, 3 Mar. 2020, www.columbiapsychiatry.org/mental-health-data-science.

Use of Analytics in Prediction and Prevention of Coronary Artery Disease

Use of Analysis in Prediction and Prevention of Coronary Artery Disease

Ayesha Rajan, Research Analyst at Altheia Predictive Health

Introduction

Coronary artery disease (CAD) is a chronic, comorbid condition that is usually the result of plaque buildup leading to limited blood flow. CAD is the leading cause of death and loss of productivity in the United States due to an aging population and globalization/ urbanization. It is expected to be responsible for over 20 million global deaths by 2030 (World Health Organization). As a result, CAD is clearly an issue that needs attention in terms of preventative care. While there currently exist many different prediction tools, most of them are severely lacking in the use of data points that are correlated with the presence of CAD.

Key Data Points

The table in Figure 1 shows normal, at risk, high risk and highest risk (when possible) ranges for key points in a typical metabolic panel. If you could only look at a limited number of factors to predict and prevent CAD, these would be among the strongest points and are the basic data points included in most current predictive models. However, looking specifically at LDL cholesterol, as well as presence of diabetes or chest pain, cigarette-use, sex, race, and age can provide even more context and accuracy. One could cast an even wider net for more data points and include the results of a resting electrocardiograph, existing exercise induced angina or hyperlipidemia maximum heart rate, ST depression, slope of peak ST segment, and the total number of major vessels colored in fluoroscopy (Saxena).

 Diastolic Blood PressureSystolic Blood PressureTotal CholesterolFasting Blood SugarBMI
Normal<80<120<200<10012-24.9
At Risk80-89120-129200-239100-12525-29.9
High Risk90+130-179240+>125<30
Highest Risk120+180+   

Figure 1: Key CAD Data Points in Metabolic Panel

Existing Analysis

The most prominent example of analytics in relation to CAD is the Framingham Risk Score which is a result of the Framingham Heart Study. The score is an algorithm that estimates a person’s 10-year risk of developing CAD in terms of low, intermediate, and high risk. It takes into account age, sex, presence of diabetes, smoking habits, systolic blood pressure, total cholesterol, HDL cholesterol and BMI or lipids. While the Framingham Equation is intensive and detailed, there are still a great number of factors that could improve its accuracy. For example, the prevalence of CAD varies in race populations: the corresponding age-adjusted prevalence of heart disease among whites, blacks, Hispanics, and Asians was 11.0%, 9.7%, 7.4%, and 6.1%, respectively (Virani). This is a data point with significant variance and using it as a prediction tool could improve the accuracy of the Framingham Equation or any other formula. Adding in other risk factors could also mean that the Framingham risk score could expand to include those who fall outside of the targeted 30-79 year age range or could include patients with diabetes, two groups the current scoring algorithm leaves out. The complexity in analyzing the symptoms and comorbid conditions related to CAD means that one in five patients are victims of misdiagnosis, further confirming the necessity to improve the existing analytics tools (Foote).

 

Where Can Analytics Take Us Next? 

A topic that many people have heard of but equally as many people have not fully grasped is artificial intelligence. Artificial intelligence is the use of predictive models to forecast future events; in terms of CAD, computer programs look at and analyze all the data points available and come up with algorithms that have the strongest correlation variable possible. Currently, a company called Ultromics houses the EchoGo Core system which is an artificial intelligence technology that utilizes ultrasound images to identify disease. In its trials, its diagnostic performance yielded over 90% accuracy and halved the number of misdiagnoses compared to traditional clinical analysis (Foote).

Another direction we may see CAD prediction go into is genomics. It may not be a surprise to many people that CAD often runs in families; this fact indicates there may be data points in genomic profiles that can indicate the risk of having the disease. A study published by the Journal of the American Heart Association investigated the possibility that DNA could hold answers to predicting heart disease and concluded DNA methylation data could, in fact, aid in discovering high-risk individuals who were not classified as “at risk” by other studies, such as those with lower Framingham Risk Scores, which used metabolic panel data (Westerman).

Based on the preceding analysis, it appears the increased use of AI combined with access to a very broad data set is our best path to creating much more robust and accurate predictive models which can lead to earlier and more targeted interventions and better outcomes.

Notes on Prevention and Management

In a study published on NCBI, individuals who changed the following things about their lifestyle and diet also showed a decreased risk of contracting CAD: avoid smoking, increase physical activity, avoid being overweight, using healthy fats, eating fruits and vegetables, using whole grains, reducing, sugar and reducing sodium (Razzak). All of these things are also well-known components of a generally healthy lifestyle and mitigating factors for many other chronic conditions and disease other than CAD.

Citations

“About Cardiovascular Diseases.” World Health Organization, World Health Organization, 29 Sept. 2011, www.who.int/cardiovascular_diseases/about_cvd/en/.

Foote, Natasha. “Artificial Intelligence Technology Developed to Predict Heart
Disease.” Www.euractiv.com, EURACTIV.com, 30 Apr. 2020, www.euractiv.com/section/health-consumers/news/artificial-intelligence-technology-developed-to-predict-heart-disease/.

Razzak, Muhammad Imran, et al. “Big Data Analytics for Preventive Medicine.” Neural Computing & Applications, NCBI, 16 Mar. 2019, www.ncbi.nlm.nih.gov/pmc/articles/PMC7088441/.

Saxena, Kanak. “Efficient Heart Disease Prediction System.” ScienceDirect, 2016, www.sciencedirect.com.

Virani, Salim. “Heart Disease and Stroke Statistics—2020 Update: A Report From the American Heart Association.” AHA Journals, 2020, www.ahajournals.org/doi/10.1161/CIR.0000000000000757.

Westerman, Kenneth. “Epigenomic Assessment of Cardiovascular Disease Risk and Interactions With Traditional Risk Metrics.” Journal of the American Heart Association, 2020.