Measuring the Effect of Mental Health on Type 2 Diabetes (2024)

1. Introduction

Diabetes is a health priority, with a global prevalence estimate of about 450 million people affected, and 8% of the population has been diagnosed with diabetes [1]. Diabetes is a severe chronic disease that causes individuals to lose the ability to control glucose levels in their blood effectively, reducing their quality of life and life expectancy. Diabetes is usually characterized by the body not producing enough insulin or the inability to use produced insulin effectively. There is no cure for diabetes yet, but strategies such as healthy eating, weight control, and getting treatment can alleviate the damage of the disease in many patients. Early diagnosis can lead to lifestyle changes and more effective treatment.

Diabetes is classified into two main categories, known as type 1 and type 2. Type 1 is known as insulin-dependent diabetes, and it has been found to make up about 5–10% of diabetes patients, whereas 90% of patients are classed as having type 2 diabetes. Type 2 diabetes is the more prevalent subtype and is limited to defective insulin secretion through the absence of inhibitory feedback through plasma glucagon levels, chronic exposure to free fatty acids, lipotoxicity, etc. [2].

Due to its global prevalence and severity, it is a matter of interest to study what causes diabetes, what treatments are available, and how much these treatments affect diabetes. Many studies have been conducted to identify the causes of diabetes. On the one hand, many previous studies suggest that various genetic factors cause diabetes [3,4]. On the other hand, environmental factors are also regarded as important factors that cause diabetes [5]. This research focuses on causal inference analysis of diabetes using the latter factors.

This research aims to suggest a causal model that explains how environmental factors affect diabetes by elucidating which mediate factors affect the onset of diabetes, as well as how much a specific treatment factor affects it. To construct a causal model, we adopted a theoretical model that explains the relationships between environments and chronic diseases, proposed by Frank et al. [1]. Many previous studies suggest that stress affects diabetes, in terms of both its onset and its exacerbation [6]. Therefore, this research also aims to evaluate how the quality of mental health impacts the onset of diabetes within the proposed causal model.

A randomized experiment is a common way to validate a causal model. However, it is challenging to randomize participants in a study, so it is desirable to use an observational study design. For this study, we used a survey dataset collected by the Centers for Disease Control and Prevention (CDC) via the Behavioral Risk Factor Surveillance System (BRFSS). Unlike a randomized experiment, when a causal analysis is inferred by using observational studies, it is challenging to control bias due to confounding variables. To resolve this problem, we employed a machine learning-based causal inference method, called DoWhy, proposed by Microsoft researchers [7]. DoWhy allows the creation of a causal relationship model with a causal graph and structural assumptions; identifies whether or not the causal effect is estimable; estimates the effect using a statistical estimator; and, finally, refutes the obtained estimate through various robustness checks. Existing studies related to health, such as those regarding diabetes, were able to derive important factors related to health, but were unable to quantitatively evaluate the importance of those factors. This study is the first application of causal inference using DoWhy in the health domain and suggests the quantitative treatment effect of stress objectively.

2. Environmental Models for Diabetes

In order to analyze causal relationships, it is necessary to formulate a causal model. For this purpose, we adopted the theoretical framework proposed by Frank et al. [1]. They proposed a model that links the environment to health, as shown in Figure 1. They claim that environmental factors—such as transportation infrastructure, land use/walkability, pedestrian environment, and green space—affect health outcomes through healthy behaviors (e.g., dietary intake, physical activity, and social interaction) and exposure to harmful substances and stressors (e.g., air pollution, safety and crime, and noise). These two mediating factors affect biological responses such as BMI/obesity, systemic inflammation, and stress. Finally, these biological responses affect chronic diseases, including diabetes. As this model aims to approximate the relationship between environment and health, this model may have some limitations, but can be applied to diabetes. Research on diabetes shows that environmental factors, including walkability, air pollution, food and physical activity, and roadway proximity, are the most common environmental characteristics studied [8,9]. These factors can be explained by the environmental factors suggested by Frank et al. [1].

Previous studies on diabetes also identify the various mediate factors that affect diabetes. For example, various studies have demonstrated a link between physical activity, BMI, obesity, and diabetes [10,11,12]. They discussed the importance of exercise as a critical component of diabetes management and explained precautions for people with type 2 diabetes. People need social interaction for physical and mental health. Psychological and physical stress are associated with triggers for the development of type 2 diabetes mellitus (T2D). Chronic stress and obesity form a vicious cycle of metabolic disorders and cause diabetes [13]. Therefore, this study was conducted in order to confirm the influence of mental health on type 2 diabetes, based on causal inference. Based on these studies, we adopted the causal model proposed by Frank et al. [1] in this research.

3. Causal Inference in Machine Learning Area

The demand for reliable AI systems in recent years has fueled the adoption of causal approaches in machine learning (ML) research. As emphasized by [14], causal inference is critical for overcoming current ML limitations. For example, the widespread use of black box models for socially sensitive decision-making requires an explanation of the logic involved [15]. Traditional ML algorithms are based on correlations between variables rather than on appropriate causal structures, where there is a risk of making erroneous, biased, or harmful decisions.

In the Explainable AI (XAI) branch, several teams have begun investigating black box decisions utilizing causal frameworks [16]. To extract causal information from black box models, Zhao and Hastie [17] utilize an XAI tool, the links between partial dependencies, and backdoor criteria. In addition to ex post descriptions, causality discovery algorithms can provide an intrinsically interpretable method. In other words, the limitation of machine learning methodology is that it cannot explain black boxes, and machine learning-based causal inference can solve this problem.

Many questions in the data science field are fundamentally causal, such as marketing campaigns, the impacts of new product features, the reasons for customer churn, and which drugs may work best. A causal question requires overall process knowledge of data creation and cannot be answered by data alone. Many fields have tried to solve causality problems mathematically, but they struggle to understand and benefit from the results of causal analysis. As a result, various methodologies, such as graphic models and non-parameter structural equations, have been developed.

Moreover, the rapid development of the causal reasoning field over the years has resulted in several algorithms that improve classical methods for estimating the causal effects of treatments on outcomes. As the field of data science has grown, many practitioners and researchers have recognized the value of causal reasoning in providing insights into data. However, the greatest challenge for data scientists and machine learning engineers who are familiar with non-causal methods and unfamiliar with the use of causal methods is identifying modeling assumptions and causality [7].

Predictive models are used for patterns in which observed data inputs lead to outcomes. However, counterfactual, a fundamental concept of causal inference, means that data without intervention are always unavailable when there is an intervention, making questions related to counterfactual estimation common in decision-making scenarios. Will the proposed system change improve people’s outcomes? What changes the outcome of the system? What changes in the system could improve people’s outcomes? How do systems interact with human behavior? How do the system’s recommendations affect people’s activities? To answer these questions, causal reasoning is required.

4. Methods

4.1. DoWhy and Causal Inference

Causal inference refers to the process of inferring the effects of interference. For example, inferring the effects of drugs on diseases demonstrates causal inference. DoWhy is one of the methodologies of causal inference. DoWhy is an open-source python library for causal inference, and it provides the most powerful frameworks known for causal inference, such as potential outcomes and graphical models that are based on modeling assumptions and identifying the causal effect. Many people have made key contributions to improving the usability and functionality of the library, such as the integrated Pandas interface for DoWhy’s four steps.

The DoWhy library makes three contributions [7]. First, the DoWhy library provides a principled way to model a given problem as a causal graph, so all assumptions are explicit. Second, the DoWhy library combines graphical models and potential results to provide a unified interface for many popular causal inference methods. Last, but not least, the DoWhy library automatically tests the validity of the assumptions and evaluates the robustness of the assumptions against violations.

DoWhy focuses on explicitly modeling and validating causal assumptions and provides a four-step causal inference method for causal reasoning. A key feature of DoWhy is a state-of-the-art refutation API that can automatically test causal assumptions for all estimation methods. Through integration with the EconML library, DoWhy supports average causal effect estimation and Conditional Average Treatment Effects (CATE) estimation for backdoors, frontdoors, instrumental variables, and other identification methods.

Causal inference follows four steps: modeling, identification, estimation, and refutation. First, the modeling step encodes prior knowledge as a formal causal graph for each problem. This serves to make each causal assumption explicit. Next, identification uses graph-based methods to identify the causal effect (estimand), and estimation uses statistical methods for estimating the identified estimand. Finally, refutation tries to refute the obtained estimate by testing the robustness of the initial model’s assumptions.

Causal inference based on the DoWhy library can analyze the influence of factors such as stress intervention on diabetes. First, DoWhy is used to develop causal assumptions and graphically represent structural parts based on diabetes health data. Subsequently, the correct estimand is formulated based on the causal model, and a suitable method is used to estimate the effect. Finally, the robustness of estimates to assumption violations is checked using three different methods. Figure 2 shows the research process of the study, and Appendix A shows the code used for causal inference.

4.2. Data

This study used a dataset created from the Behavioral Risk Factor Surveillance System (BRFSS) 2015 dataset (refer to the Data Availability Statement for the detail of the dataset). This dataset contains 253,680 survey responses to the Centers for Disease Control and Prevention (CDC). The target variable of this dataset has two classes: 0 is for no type 2 diabetes, and 1 is for type 2 diabetes. This study used eleven feature variables and one treatment variable. Mental health is a treatment variable, and this variable indicates the degree to which mental health is poor. The characteristics of the variables are shown in Table 1.

5. Results

5.1. Study Population

The population of this study is summarized in Table 2. The target variable is diabetes, and it is divided into diabetes and no diabetes. A dummy variable is created for mental health that takes only values of 0 or 1, which indicate low or high values based on the average value (3.185). The dummy variable is used in the causal inference model as the treatment variable. This study uses input variables like physical activity, vegetable, heavy drinker, fruits, age, cholesterol, sex, smoker, health, BMI, and blood pressure.

5.2. Causal Inference Results

This study aims to investigate the relationships between disease factors of diabetes and patients’ levels of mental health. The regression model used in existing research examines the relationships between independent variables and the number of longitudinal findings in a two-dimensional manner. However, actual causal relationships have a structural nature. DoWhy supports structured causal relationships in order to solve the problems with regression models. Therefore, this study was conducted based on the four steps of causal inference provided by DoWhy. First, data related to diabetes were collected from 253,680 responses. Second, a causal graph was drawn using the environmental factors, biological response factors, and disease factors. Causal graphs are easy to understand as they visually represent causal relationships. Third, we identified causal relationships in the model and estimated causes based on backdoor criteria. Fourth, we obtained estimates for identified causal relationships, and looked at how much the causal action affects the result. We used the propensity_score_stratification method, and showed an estimate value of about 15%. Finally, various refutations were attempted in order to suggest that the obtained estimate may not be correct. Table 3 is a summary of the causal inference procedure and results.

5.3. Four Steps of Causal Inference

Step 1: Create a causal graph. A causal model of the disease was designed based on the research of Frank et al. [1]. This study developed a causal inference model that can be applied in order to understand the most suitable treatment method. This model used the disease dataset and built a causal model with the treatment variable of mental health and the dependent variable of diabetes. The DoWhy framework is installed first, and required libraries are imported. A causal model of DoWhy is formed using the created datasets by explicitly stating the assumptions. A basic causal graphical model for each problem is built to visually verify the model, as shown in Figure 3.

Step 2: Identify the causal effect. Treatment can be said to cause the outcome if the treatment and the outcome both change while all other factors remain constant. Therefore, in this step, the properties of the causal relationship graph are used to identify the estimand to be estimated.

Estimand type: nonparametric-ate

### Estimand: 1
Estimand name: backdoor
Estimand expression:
d
-------- (Expectation(Diabetes | Physical activity, Fruits, BMI, Blood pressure, Vegetable))
d[Mental health]
Estimand assumption 1, Unconfoundedness: If U→[8] and U→Diabetes then P(Diabetes|Mental health, Physical activity, Fruits, BMI, Blood pressure, Vegetable, U) = P(Diabetes|Mental health, Physical activity, Fruits, BMI, Blood pressure, Vegetable)

### Estimand: 2
Estimand name: iv
No such variable(s) found!

### Estimand: 3
Estimand name: frontdoor
No such variable(s) found!

Step 3: Estimate the identified estimand. DoWhy provides three estimation methods: backdoor, frontdoor, and instrumental variable. The results of this study can be used as a backdoor. The backdoor is a method of conditioning the causes when there are measurable general causes that affect X and Y, in order to identify the causal relationship that the cause variable, X, has with the result variable, Y. This study infers causal relationships based on data already collected, and since prior equivalence is not secured, problems with selection bias may occur. To minimize these problems, we use the propensity score method, which is one of the statistical solutions. The propensity score method refers to the probability that a subject will be included in the treatment group rather than the control group based on a specific covariate.

There are three methodologies: propensity score stratification, propensity score matching, and propensity score weighting. Propensity score stratification is a method of stratifying treatment and control group individuals with similar propensity scores into K groups. Propensity score matching is a method of matching individuals in the treatment group and the comparison group with identical or similar propensity scores as a pair. Propensity score weighting is a method of assigning weights so that the propensity scores of the treatment group and the control group are the same. Table 4 shows the ATC (average treatment effect for the control) based on the propensity score method.

We looked at the process of the propensity score stratification method in detail. DoWhy supports methods based on backdoor criteria and tool variables. The treatment’s causal effect on the outcome is based on the change in the value of the treatment variable, and the effect (ATE) is estimated using the stratification method for the propensity score (backdoor.propensity_score_stratification).

propensity_score_stratification
*** Causal Estimate ***

## Identified estimand
Estimand type: nonparametric-ate

### Estimand: 1
Estimand name: backdoor
Estimand expression:
d
──── (Expectation (Diabetes | Physical activity, Fruits, BMI, Blood pressure, Vegetable))
d[Mental health]
Estimand assumption 1, Unconfoundedness: If U→{Mental health} and U→Diabetes then P(Diabetes|Mental health, Physical activity, Fruits, BMI, Blood pressure, Vegetable,U) = P(Diabetes|Mental health, Physical activity, Fruits, BMI, Blood pressure, Vegetable)

## Realized estimand
b: Diabetes ~ Mental health + Physical activity + Fruits + BMI + Blood pressure + Vegetables
Target units: ate

## Estimate
Mean value: 0.1455803874296158

6. Conclusions and Implications

6.1. Conclusions

This study used the DoWhy library to estimate the causal relationship between mental health and diabetes. DoWhy organized the mechanism for causality inference into four steps. The first step (modeling) encodes the data into a causal graph, and the second step (identification) identifies the causal relationship of the model and transforms the causal quantity to an estimable quantity based on the available data. Step 3 (estimation) makes an estimate for the identified causality using the estimable quantity, and Step 4 (refutation) attempts to refute the obtained estimate. When estimating a causal effect, essential assumptions are made, such as the direction of the effect, the presence of an instrumental variable or mediator, and whether all relevant confounding factors are observed. Violation of this assumption results in significant errors in the causal effect estimate. For predictive models using machine learning, cross-validation methods exist, but global validation methods for causal inference do not currently exist.

Therefore, expressing and validating causal assumptions as formally as possible is essential. For this purpose, DoWhy provides the ability to explicitly declare a causal model in Step 1. It also provides several validation methods to verify subsets of assumptions. Validation tests to better detect errors, such as mean causality and conditional causation, have been developed and are addressed in Step 4.

This study estimated whether mental health issues, such as stress, could affect the onset of diabetes. Environmental and biological factors were considered external factors, the mental health (stress) factor was defined as the treatment variable, and diabetes onset was defined as the outcome variable for analysis. According to the analysis results, it was found that at high levels of stress, the risk of diabetes is increased by about 15%. Existing studies are researching the onset of diabetes and complications caused by chronic diseases. However, these do not specifically suggest the possibility of diabetes mellitus being affected by stress [18]. This study suggests the degree to which mental health issues could induce diabetes using causal inference methodology.

Our study used the propensity score and matching techniques to estimate causal effects without satisfying parametric assumptions among the structural assumptions of ignorability and parametric assumption in causal inference. This method is possible when used with a flexible modeling approach rather than linear modeling on the response surface. The results of this study’s analysis can be said to be reliable because the robustness of the estimate was confirmed by passing the validation tests of random common cause, placebo treatment refuter, and data subset refuter.

We estimated the causal effect of mental health on diabetes using DoWhy, an extensive library for causal inference. Unlike most other libraries, DoWhy focuses on helping analysts devise correct causal models and test the assumptions made in addition to estimating causal effects. This study designed a causal model for diabetes based on the framework of Frank et al. due to its unique advantages [1]. This model set mental health as the treatment variable and diabetes as the outcome variable. The identified estimand was evaluated based on the backdoor criteria supported by DoWhy, and the causal effect was evaluated based on the causal model. Random common cause, placebo treatment, and data subset refuter were the methodologies used to verify the causal relationship. Through three refutation tests, the reliability of the DoWhy causal inference model for diabetes inference using the treatment variable called mental health was demonstrated.

6.2. Implications

The academic implications of this study are as follows. First, this study shows the relationships between environmental and biological factors and diabetes factors. Environmental factors should be considered when conducting disease research, such as diabetes research, in the future. Second, the mental health factor was used as a treatment variable to quantitatively examine its impact on diabetes factors. Mental health factors show potential as data for quantitative research on disease induction in the future. Third, a methodology that can overcome the limitations of machine learning by using the DoWhy library was presented. Machine learning-based causal inference models can be used when conducting similar research in the future.

The practical implications of this study are as follows. First, it was revealed that mental health factors are a significant cause of diabetes and can be linked to the constant increase in its diagnosis. In other words, it can be seen that the management of mental health in diabetic patients is important. Second, this study reaffirms that diabetes is caused by factors such as physical activity, smoking, and sex. Therefore, it can be seen that diabetic patients should manage these factors. Finally, it can be seen that patients with modern diseases should be aware of the importance of environmental and biological factors.

The limitations of this study and future research directions are as follows. First, this study considered only stress as a mental factor. In future research, causal inference studies on diseases need to consider various physical factors (e.g., sex, age) as well as various mental factors. Second, this study was conducted using DoWhy, a machine learning-based causal inference model. It is pertinent to reevaluate and optimize the EconML model for DoWhy in the future.

Author Contributions

Conceptualization, M.N. and Y.K.; methodology, M.N. and Y.K.; analysis, M.N. and Y.K.; investigation, M.N. and Y.K.; resources, M.N. and Y.K.; data curation, M.N. and Y.K.; writing—original draft preparation, M.N. and Y.K.; writing—review and editing, M.N. and Y.K.; funding acquisition, M.N. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Bisa Research Grant of Keimyung University in 2023 (project no: 20230285).

Data Availability Statement

The dataset can be downloaded at: www.kaggle.com/alexteboul/diabetes-health-indicators-dataset (accessed on 20 May 2023).

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Analysis Details

Step 1: Create a causal graph.

import statsmodels
import pygraphviz
causal_graph = “““digraph {
Heavy drinkers;
Vegetable;
Physical activity;
Fruits;
Sex;
Smoker;
Age;
Cholesterol;
Health;
BMI;
Blood pressure;
Smoker;
Mental health;
Diabetes;
Smoker->Mental health; Smoker->Health; Smoker->Blood pressure;
Smoker->Diabetes;
Physical activity->Heavy drinkers; Physical activity->Cholesterol;
Physical activity->Mental health; Physical activity->BMI; Physical activity->Smoker;
Heavy drinkers->Mental health; Heavy drinkers->Diabetes;
Heavy drinkers->Blood pressure; Heavy drinkers->Health;
Heavy drinkers->MBI; Heavy drinkers->Cholesterol;
Vegetable->Diabetes; Vegetable->Mental health; Vegetable->Blood pressure;
Vegetable->Health; Vegetable->Cholesterol; Vegetable->BMI;
Fruits->Cholesterol; Fruits->Health; Fruits->BMI; Fruits->Blood pressure;
Fruits-> Mental health; Fruits->Diabetes;
Sex->Health; Sex->Diabetes; Sex->BMI; Sex->Mental health; Sex->Blood pressure;
Age->Health; Age->BMI; Age->Blood pressure; Age->mental health; Age->Diabetes;
Cholesterol->Health; Cholesterol->BMI;
Health->Blood pressure;
BMI->Blood pressure;
Blood pressure->Mental health;
Mental health->Diabetes
}”““

Model = dowhy.CausalModel(
data = dataset,
graph = causal_graph.replace(“\n”, “ “),
treatment = “Mental health”,
outcome = “Diabetes”)

Step 2: Identify the causal effect.
See Also
Oxford Reading Tree & Levels: parent guide - Oxford Owl for Home

identified_estimand = model.identify_effect (proceed_when_unidentifiable = True)
print(identified_estimand)

Step 3: Estimate the identified estimand.

estimate = model.estimate_effect (identified_estimand, method_name = “backdoor.propensity_score_stratification”, target_units = “atc”)
# ATE = Average Treatment Effect on Control
print(estimate)

estimate = model.estimate_effect (identified_estimand, method_name = “backdoor.propensity_score_matching”, target_units = “atc”)
# ATE = Average Treatment Effect on Control
print(estimate)

estimate = model.estimate_effect (identified_estimand, method_name = “backdoor.propensity_score_weighting”, target_units = “atc”)
# ATE = Average Treatment Effect on Control
print(estimate)

estimate = model.estimate_effect (identified_estimand, method_name = “backdoor.propensity_score_stratification”, target_units = “ate”)
# ATE = Average Treatment Effect
print(estimate)

Step 4: Refute results.
Random Common Cause

refute1_results = model.refute_estimate(identified_estimand, estimate, method_name = “random_common_cause”)
print(refute1_results)

Placebo Treatment Refuter

refute2_results = model.refute_estimate(identified_estimand, estimate, method_name = placebo_treatment_refuter”)
print(refute2_results)

Data Subset Refuter

refute3_results = model.refute_estimate(identified_estimand, estimate, method_name = “data_subset_refuter”)
print(refute3_results)

References

Frank, L.D.; Iroz-Elardo, N.; MacLeod, K.E.; Hong, A. Pathways from built environment to health: A conceptual framework linking behavior and exposure-based impacts. J. Transp. Health 2019, 12, 319–335. [Google Scholar] [CrossRef]
Liu, Q.; Li, J.; Cheng, R.; Chen, Y.; Lee, K.; Hu, Y.; Yi, J.; Liu, Z.; Ma, J.-X. Nitrosative stress plays an important role in Wnt pathway activation in diabetic retinopathy. Antioxid. Redox Signal. 2013, 18, 1141–1153. [Google Scholar] [CrossRef] [PubMed]
Lin, Y.; Sun, Z. Current views on type 2 diabetes. J. Endocrinol. 2010, 204, 1–11. [Google Scholar] [CrossRef] [PubMed]
McAuley, P.A.; Beavers, K.M. Contribution of cardiorespiratory fitness to the obesity paradox. Prog. Cardiovasc. Dis. 2014, 56, 434–440. [Google Scholar] [CrossRef] [PubMed]
Tremblay, J.; Hamet, P. Environmental and genetic contributions to diabetes. Metabolism 2019, 100. [Google Scholar] [CrossRef] [PubMed]
Falco, G.; Pirro, P.S.; Castellano, E.; Anfossi, M.; Borretta, G.; Gianotti, L. The relationship between stress and diabetes mellitus. J. Neurol. Psychol. 2015, 3, 7. [Google Scholar]
Sharma, A.; Kiciman, E. DoWhy: An end-to-end library for causal inference. arXiv 2020, arXiv:2011.04216. [Google Scholar] [CrossRef]
Dendup, T.; Feng, X.; Clingan, S.; Astell-Burt, T. Environmental risk factors for developing type 2 diabetes mellitus: A systematic review. Int. J. Environ. Res. Public Health 2018, 15, 78. [Google Scholar] [CrossRef] [PubMed]
Raman, P.G. Environmental factors in causation of diabetes mellitus. In Environmental Health Risk-Hazardous Factors to Living Species; Larramendy, M.L., Soloneski, S., Eds.; IntechOpen: London, UK, 2016. [Google Scholar] [CrossRef]
Cavallo, F.R.; Golden, C.; Pearson-Stuttard, J.; Falconer, C.; Toumazou, C. The association between sedentary behaviour, physical activity and type 2 diabetes markers: A systematic review of mixed analytic approaches. PLoS ONE 2022, 17, e0268289. [Google Scholar] [CrossRef] [PubMed]
American Diabetes Association. Physical activity/exercise and diabetes. Diabetes Care 2004, 27, 58–62. [Google Scholar] [CrossRef] [PubMed]
Gallardo-Gomez, D.; Salazar-Martinez, E.; Alfonso-Rosa, R.M.; Raos-Munell, J.; del Pozo-Cruz, J.; del Pozo Cruz, B.; Alvarez-Barbosa, F. Optimal dose and type of physical activity to improve glycemic control in people diagnosed with type 2 diabetes: A systematic review and meta-analysis. Diabetes Care 2024, 47, 295–303. [Google Scholar] [CrossRef] [PubMed]
Ingrosso, D.M.F.; Primavera, M.; Samvelyan, S.; Tagi, V.M.; Chiarelli, F. Stress and diabetes mellitus: Pathogenetic mechanisms and clinical outcome. Horm. Res. Paediatr. 2023, 96, 34–43. [Google Scholar] [CrossRef] [PubMed]
Pearl, J.; Bareinboim, E. External validity: From do-calculus to transportability across populations. In Probabilistic and Causal Inference: The Works of Judea Pearl; Geffner, H., Dechter, R., Halpern, J.Y., Eds.; Association for Computing Machinery: New York, NY, USA, 2022; pp. 451–482. [Google Scholar] [CrossRef]
Bodria, F.; Giannotti, F.; Guidotti, R.; Naretto, F.; Pedreschi, D.; Rinzivillo, S. A survey of methods for explaining black-box models. ACM Comput. Surv. 2021, 51, 1–33. [Google Scholar] [CrossRef]
Moraffah, R.; Karami, M.; Guo, R.; Raglin, A.; Liu, H. Causal interpretability for machine learning-problems, methods and evaluation. ACM SIGKDD Explor. Newsl. 2020, 22, 18–33. [Google Scholar] [CrossRef]
Zhao, Q.; Hastie, T. Causal interpretations of black-box models. J. Bus. Econ. Stat. 2021, 39, 272–281. [Google Scholar] [CrossRef] [PubMed]
Bhatti, J.S.; Sehrawat, A.; Mishra, J.; Sidhu, I.S.; Navik, U.; Khullar, N.; Kumar, S.; Bhatti, G.K.; Reddy, P.H. Oxidative stress in the pathophysiology of type 2 diabetes and related complications: Current therapeutics strategies and future perspectives. Free. Radic. Biol. Med. 2022, 184, 114–134. [Google Scholar] [CrossRef] [PubMed]

Figure 1.This socio-environmental model explains the relationship.

Figure 2.Research process aims to measure causal effects of treatments by following four steps of DoWhy—modeling, identification, estimation, and refutation.

Figure 3.The causal model of diabetes used for causal inference is depicted using a causal graph.

Table 1.Characteristics of variables.

Variables		Items
Target variable	Diabetes	0 = no diabetes, 1 = diabetes
Feature variables	Physical activity	Physical activity in the past 30 days (not including a job) 0 = no, 1 = yes
	Vegetable	Consumption of vegetables more than once per day 0 = no, 1 = yes
	Heavy drinker	Heavy drinkers (adult men having more than 14 drinks per week and adult women having more than 7 drinks per week) 0 = no, 1 = yes
	Fruits	Consumption of fruit more than once per day 0 = no, 1 = yes
	Age	1 = 18–24, 2 = 25–29, 3 = 30–34, 4 = 35–39, 5 = 40–44, 6 = 45–49, 7 = 50–54, 8 = 55–59, 9 = 60–64, 10 = 65–69, 11 = 70–74, 12 = 75–79, 13 = 80 or older
	Cholesterol	0 = no high cholesterol, 1 = high cholesterol
	Sex	0 = female, 1 = male
	Smoker	Consumption of at least 100 cigarettes (5 packs = 100 cigarettes) in life 0 = no, 1 = yes
	Health	General health 1 = excellent, 2 = very good, 3 = good, 4 = fair, 5 = poor
	BMI	Body Mass Index
	Blood pressure	0 = no high blood pressure, 1 = high blood pressure
Treatment variable	Mental health	Days during the past 30 days where one’s mental health was not good (including stress, depression, and problems with emotions) Scale 1–30 days

Table 2.The population of this study.

Variables			Frequency	Percentage
Target variable	Diabetes	no diabetes diabetes	213,703 39,977	84.2 15.7
Feature variables	Physical activity	No Yes	61,760 191,920	24.3 75.7
	Vegetable	No Yes	47,839 205,841	18.9 81.1
	Heavy drinker	No Yes	239,492 14,256	94.4 5.6
	Fruits	No Yes	92,782 160,898	36.6 63.4
	Age	18–24 25–29 30–34 35–39 40–44 45–49 50–54 55–59 60–64 65–69 70–74 75–79 80 or older	5700 7598 11,123 13,823 16,157 19,819 26,314 30,832 33,244 32,194 23,533 15,980 17,363	2.2 3.0 4.4 5.4 6.4 7.8 10.4 12.2 13.1 12.7 9.3 6.3 6.8
	Cholesterol	No high cholesterol High cholesterol	146,089 107,591	57.6 42.4
	Sex	Female Male	141,974 111,706	56.0 44.0
	Smoker	No Yes	141,257 112,423	55.7 44.3
	Health	Excellent Very good Good Fair Poor	45,299 89,084 75,646 31,570 12,081	17.9 35.1 29.8 12.4 4.8
	BMI	Underweight Normal Overweight Obese	3127 49,403 36,696 164,454	1.2 19.5 14.5 64.8
	Blood pressure	No high blood pressure High blood pressure	144,851 108,829	57.1 42.9
Treatment variable	Mental health	Low High	204,653 49,027	80.7 19.3

Table 3.Summary of causal inference and results.

Research Process	Description
Input data	253,680 survey responses to the Centers for Disease Control and Prevention
Components based on causal graph	Environmental factors: heavy drinker, vegetable, physical activity, fruits, sex, smoker, age Biological response factors: cholesterol, health, BMI, blood pressure Disease factors: mental health (mental), diabetes (physical)
Identify the causal effect	Estimand type: backdoor criterion
Estimate the identified estimand	Method: propensity_score_stratification Target_units: ATE Estimate: about 15%
Refute	Add random common cause: p-value 0.5 Placebo treatment: p-value 0.45 Subset of data: p-value 0.48

Table 4.Propensity score method.

Method	atc
Propensity score stratification	0.150
Propensity score matching	0.120
Propensity score weighting	0.146

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).