Effects of environment and globalization on the double and triple burdens of infection sym

November 19, 2025

  • Research Article
  • Open access
  • Published: 20 November 2025

Abstract

Childhood infectious diseases and related symptoms, such as fever, cough, and diarrhea among children constitute the leading cause of death in low and middle-income countries (LMICs). We examined the environmental predictors of double and triple burden (D/TB) of infection symptoms among under-five children using multilevel machine learning (ML) methods.

We used Demographic and Health Surveys (DHS) data from 58 LMICs between 2000 and 2023. These data were merged with cluster-level particulate matter and nitrogen dioxide from the National Aeronautics and Space Administration and country-level data on political, social, and economic globalization from the World Bank report. We applied multilevel models to screen out the most important predictors of D/TB symptoms and applied machine learning algorithms to predict these symptoms among children across LMICs. We trained and validated ML algorithms on (80, 70, and 60%) of the data and tested on the remaining (20, 30, and 40%) with 2, 5 and 10 cross-validations.

Of 1,546,243 children, 19.2%, 20.5% and 12.6% had fever, cough, and diarrhea, respectively; while the overall D/TB prevalence was 11.9% and 3.7%, respectively. The result revealed D/TB were associated with the location of a child, survey years, wealth index, family size, air pollutants, and environmental covariates. The estimated prevalence of both D/TB symptoms substantially varies across districts [intraclass correlation (intraclass correlation, ICC = 13.3%)] and countries (ICC = 8.8%). We found that the Random Forest gave the maximum Area Under the Curve of 94% and 99% for D/TBs for the K10 protocol and 80:20 training and testing dataset splits.

The study found substantial variation in the prevalences of D/TB of illness among children under five and identified several environmental and sociodemographic predictors of these health outcomes. The Random Forest algorithm performed best in predicting these burdens. The study emphasized how integrating environmental and sociodemographic data with machine learning can enhance targeted interventions to reduce childhood infectious disease burdens in low- and middle-income countries.

Background

Acute respiratory infections (ARIs) are among the most common illnesses in children, contributing to over 6% of the global disease burden, and they constitute the leading cause of death in children under five years of age [1,2,3,4]. According to the World Health Organization (WHO) in 2019, the under-five death rate due to ARIs was 73 and 9 per 1000 live births in the African and European regions, respectively, the rate in the African region being nearly eight times higher than in the European regions [3, 5]. Almost one-third of under-five years old children in low and middle-income countries (LMICs) had either pneumonia (16%); diarrhea (11%); and malaria (7%) [6]. Diarrhea [1] and fever [7] are the most common illness symptoms in under-five children, and are indicators of disorders, malnutrition, and death in children [6]. In LMICs the 2-week prevalence of fever, diarrhea, and acute respiratory infection was estimated at 18.8%, 12.5%, and 4.3%, respectively [2]. Previous literature has provided evidence that symptoms of ARIs in under-five-year-old children are directly related to environmental, socioeconomic, and cultural factors in the population of interest [4, 8,9,10,11,12]. Air pollution disproportionately affects the under-five children residing in LMICs, nearly 89% of deaths due to air pollution occurred in these countries, mainly in Africa and Asia [13]. It is confirmed that 92% of the world’s population live in areas where the air quality index (AQI) limit is exceeded (> 100, AQI near 100 is usually considered safe) [14] and air pollution causes about 4.2 million excess deaths every year from several diseases.

Previous studies attempted to identify the determinants of fever, cough, and ARIs [4, 8,9,10,11,12,13, 15], and diarrhea [1, 6] among under-five children. However, most of these studies were focused on the determinants of these symptoms separately and in addition, they focused on Africa and Asia in one cross-sectional survey independently. Besides, authors did not elaborate the possible co-occurrence of these symptoms and considered only the classical models to identify the determinants of these symptoms. Moreover, there are no studies to date that have applied machine learning (ML) algorithms to investigate the effects of air pollutants [such as Particulate Matter (PM2.5), nitrogen dioxide (NO2)], climate factors (temperature, land surface temperature, wet day), health-related information and socio-demographic factors on double and triple burden of infections among under-five children across the LMICs over time. Furthermore, a generic prediction framework is lacking for reliable assessment of D/TB symptoms among children under five years using a large-scale dataset employing ML algorithms. To the best of our knowledge, this study employs different ML techniques to select and identify the predictors of double and triple symptoms of infections in LMICs. The predicted values of these symptoms from the best-identified MLA based on the selected features can provide essential information to help public health decision-makers to allocate limited resources and implement programs in the areas that need more attention across LMI countries. The main aim of the study is twofold: (1) to select the important risk factors of both double and triple burdens of symptoms among under-five children across the LMICs perspectives over time; (2) to compare and assess the effectiveness of various MLAs in predicting both double and triple burdens of symptoms among under-five children across the LMI countries using a diverse large dataset. Moreover, this paper leverages these large Demographic and Health survey (DHS), National Aeronautics and Space Administration (NASA) and globalization index datasets to determine the predictors of DBs and TBs among under-five children across the 58 LMICs over the year 2000 and 2023.

Methods

The data used in this study was obtained from different sources: the first source is the DHS https://dhsprogram.com collected from LMICs (Fig. S1 and Fig. S2). The second data source was: the National Aeronautics and Space Administration (NASA) [16], where the air pollutants, such as the annual surface levels of PM2.5 (Fig. S3) and NO2 were extracted. The global and annual mean concentrations of ambient PM2.5 and NO2 [16] were the main exposure variables, and these variables were matched temporarily with the calendar year in which the DHS surveys were performed in the given countries. The third source of datasets was the Konjunkturforschungsstelle (KOF) Globalization Index which was used as a measure of globalization from 43 indicators (variables) at the country level over time (Fig. S4). The sources of the datasets, namely the Demographic and Health Surveys (DHS), NASA satellite-derived air pollution variables (PM2.5 and NO₂), and the KOF globalization indices along with the methods used to integrate these data sources, are comprehensively summarized in the conceptual framework (Fig. S5, Table S1). Detail description is of the datasets how they were matched temporarily with the calendar year in which the DHS surveys and how they managed is given in appendix (pp 2–3). Moreover, the variables included in the study, along with their descriptions, are summarized in Table S1.

Health outcomes This study applied three individual health outcomes, fever, cough, and diarrhea and their combinations appendix (p-7).

Predictors the independent variables extracted were based on a review of the literature [5, 8, 9, 11, 17,18,19] appendix (p-7).

The multilevel model allows us to include the error terms at each level, that makes it possible to track changes in variance at each level across the models [20, 21]. The specific form of our multilevel regression models is, a three-level model (level 1 = children, level 2 = districts, and level 3 = country) having dichotomous outcome variables (children having the double and triple burden of symptoms of infectious diseases) (y_ijksim Bernoulli left(pi _ijkright)), then the logit link function is:

$$log[Pr(y_ijk=1)/1-Pr(y_ijk=1]=eta _ijk=beta _0+varvecbeta_1varvecX_1ijk+varvecbeta_2varvecX_2jk+varvecbeta_3varvecX_3k+v_k+u_jk+e_ijk$$

$$v_ksim Nleft(0, sigma _v^2right),u_jksim Nleft(0, sigma _u^2right), e_ijksim Nleft(0, sigma _e^2right)$$

where, (eta _ijk=lnleft(fracpi _ijk1-pi _ijkright)pi _ijk) is the probability that the observed DBs and TBs outcomes for U5C (i^th) child ((i=1, . . .,text1,546,243)) in (j^th) districts ((j=1, . . .,625)) and in the (k^th) country(k (k=1, . . .,58)), (beta _0) is the mean value across all countries, (v_k) is the random effect of the country(k), (u_jk) is the random effect of districts(j), and (e_ijk) is the random effect of child-level residual error terms. The BLUPs (i.e., predicted random effects) in multilevel models represent deviations from a group-level mean or intercept, and these deviations can be either positive or negative [22, 23]. Thus, the expanded variance term allows us to account for the variance arising at child, district, and country levels. Moreover, (varvecbeta_1,varvecbeta_2) and (varvecbeta_3) are the vectors of fixed effect parameters for child-level, district-level, and country level covariates respectively. For the model, the adjusted odds ratios (OR) and the corresponding 95% confidence intervals (CIs) were estimated.

The intraclass correlation (ICC): the ICC was computed to assess the district and country effect/variability. It reveals the variation in the DB and TB explained by the districts and countries and computed as follows [24, 25]:

$$ ICC_districts , = fracsigma_districts^2 sigma_districts^2 + sigma_country^2 , + raise0.7exhbox$pi^2 $ !mathordleft/ vphantom pi^2 3right.kern-0pt !lower0.7exhbox$3$,ICC,attributable,to,level,2 $$

$$ ICC_country , = fracsigma_country^2 sigma_district^2 + sigma_country^2 + raise0.7exhbox$pi^2 $ !mathordleft/ vphantom pi^2 3right.kern-0pt !lower0.7exhbox$3$,text ICC,attributable,totext level 3 $$

From the multilevel model results, the important risk factors (predictors) for DBs and TBs were selected. All the risk factors were selected using the threshold P-value of < 0.05. One of the challenges of the ML approach is the imbalanced data problems [25] that the category of one class label exceeds the other label in significant size. Under-sampling is a method in which samples from the majority group are randomly chosen without replacement until the label’s balance is attained [26], while oversampling is a technique in which samples from the minority group are randomly chosen with replacement and added to the training dataset and as a result ML based classifier is performance is enhanced [27, 28]. In our dataset, the DBs (TBs) outcome classes are significantly imbalanced, with 1,319,571 (1,443,779) samples in the “No” classes and only 179,924 (55,716) samples in the “Yes” classes respectively. As a result, unless the data is balanced, the trained MLA-based system favors the majority class when classifying the imbalanced datasets [29], more likely to categorize new observations as having no DBs and TBs. The ratio of individuals who had no DBs and TBs is 7.33 and 25.9 respectively. This study took 4.16 times the “yes” class (4.16 × 179,924 = 748,484.84) children having DBs using oversampling and the remaining 751,011 under-five children who did not have DBs from 1,319,571 using the under-sampling technique to minimize the disparities between each category. Similarly, this study took 13.4 times the “yes” class (left(13.40 times text55,716=text746,594right)) children having TBs using oversampling and the remaining 752,901 under-five children who did not have TBs from 1,443,779 using the under-sampling technique to minimize the disparities between each category.

This process repeated with (7∶3) and (6∶4) split and then 2, 5, and tenfold [2 K, 5 K, and 10 K] cross-validation was used to assess the impact of different training and testing ratios on the performance of the different machine learning algorithms (MLA) [30,31,32,33,34,35]. To select suitable MLA, we reviewed related works using ML algorithms on different childhood health outcomes such as childhood nutrition status, anemia status, and mortality [36,37,38,39,40,41]. For this study, different ML algorithms such as generalized linear models (binary logistic regression [42], Ridge [43], Least Absolute Shrinkage and Selection Operator (LASSO) regression [44] and Elastic Net [33, 45]), Random Forest (RF) [33, 46], Naïve Bayes [47,48,49] and Decision Trees [49] were adopted for predicting the DBs and TBs status of children aged under five years residing in 58 LMICs (Fig. 1). The study used the most recent DHS datasets (n = 689,146 under-five children) to validate the suggested approach.

Fig. 1
figure 1

Overview flow chart of the machine learning algorithms used for predicting double burden of infectious diseases (DBs) and double burden of infectious diseases (TBs) among under-five children. CV cross-validation, SE sensitivity, SP Specificity, PPV positive predictive value, NPV negative predictive values, FM F-measure, ACC Accuracy, BID burden of infectious diseases, LR logistic regression, LASSO, RF random forest, KNN K-Nearest Neighbors, Elastic Net, NB Naïve Bayes, DT decision tree

Full size image

Model sensitivity and specificity relationships are expressed using receiver operating characteristics (ROC) curves, which are calculated based on the true and predicted outcome of interest. All the curves that are plotted to the left of the diagonal line are performing better than chance. The AUC gives an aggregated value which explains the probability that a random sample would be correctly classified by each of the ML algorithms [31, 50]. The identified best-fit model is then used to predict the health outcome status in another dataset, known as the test dataset [30,31,32,33,34,35] (Fig. 1).

Results

Overall, there was a substantial variation in the occurrence of double and triple burden of infection symptoms among under-five children across 58 LMICs for the past two decades (2000‒2023). However, the size of the point over time decreased implying that the prevalence of those symptoms slightly improved over time (Fig. 2). Bangladesh (Asia), Haiti (Latin America) and Uganda (Africa) consistently showed the highest prevalence of both double and triple infections over time. The Venn diagram revealed that nearly two-thirds (66%) of 1,021,038 under-five children were free from any of the symptoms (fever, cough, and diarrhea), while 183,143 (11.80%) experienced a double burden of infection symptoms. Moreover, 317,443 (20.53%), 307,879 (19.91%), and 197,097 (12.80%) children had cough, fever and diarrhea respectively (Fig. 2).

Fig. 2
figure 2

Co-existence of symptoms among children under five years across LMICs from 2000 to 2023

Full size image

The co-occurrence of symptoms specifically diarrhea and cough, cough and fever, and diarrhea and fever among children under five years of age across 58 countries is summarized in Fig. S6. This figure highlights significant variation in the patterns of symptom co-occurrence between countries. Moreover, the distribution of double and triple burden of symptoms among under-five children varies across the 625 administrative districts and detailed explanation is given in appendix (p–9) and Fig. S7.

The prevalence and predictors of DBs and TBs symptoms among children (n = 1,546,243) under-five years across 58 LMICs are summarized in Table S2. Across all country-years, 183,782 (11.89%) and 56,716 (3.67%) of under-five children had double burden and triple burden of infection symptoms respectively. More than half (50.42%) of the sampled children were from Africa. Over one-fifth (20.14%) of children from Latin America, 13.05% from Africa, and 9.75% from Asia had the double burden of symptoms, while 4.31%, 6.64%, and 2.65% of children residing in Africa, Latin America and Asia had the triple burden of infection symptoms, respectively. Overall, 26,362 (19.24%) and 9620 (7.02%) children from the first DHS survey year (1999–2004), and 26,140 (8.04%) and 6184 (1.90%) from the recent phases (2020–2023) were experiencing DB and TBs, respectively. Nearly 72% of them resided in rural areas, and approximately three-fourths of the respondents had more than four family members (75.42%). Nearly 23%, 35%, and 31% of children came from a household with unclean fuel use for cooking, poor sanitation conditions, and households where drinking water is untreated. Almost half (47.30%) of children came from households with high household smoking risks, almost 99% of them resided in residence areas exposed to higher than WHO recommended PM2.5 level (above 5 µg/m3) and 6.31% were exposed to higher than WHO recommended NO2 (10 µg/m3) level. We present detailed description of the presults in the appendix (pp 18 − 19) and Table S2.

Figure 3 illustrated the country-level variances by showing the average country-level intercepts from the multilevel model for both D/TB symptoms of infections. The BLUP results indicated that the value above zero (group average) reveal higher log-odds of both DB/TB symptoms, while the average and the negative value indicates lower odds. The result revealed that Uganda, Burundi, and the Democratic Republic of Congo from Africa have the highest predicted values of both double and triple burden of symptoms. Specifically, Uganda has 2.19 and 2.90 higher log-odds of DB/TB infections among under-five children compared to the average infection levels among children residing in 58 LMICs. However, Madagascar, Mali, and Niger have the lowest predicted values of infection symptoms among children in the continent. From Latin America, Haiti has the highest predicted values of both double and triple burdens while the Philippines has the lowest odds of having both double and triple burdens. From an Asian context, Bangladesh and Pakistan have the highest predicted values of both double and triple burden of symptoms while Kyrgyzstan and Tajikistan have the lowest burden.

Fig. 3
figure 3

The adjusted log odds (AO) on the best linear unbiased prediction (BLUP) of burdens of symptoms among under-five children across 58 low-middle income countries (LMICs). TBIDs Triple Burdens of Infection disease, DBIDs Double Burdens of Infection diseases

Full size image

To develop a predictive model for both DB and TB (Fig. S8) symptoms, nine ML models with three protocols (K2, K5, and K10), with three different combinations of training and testing dataset splits (80∶20, 70:30 and 60:40%) are summarized in Fig. S8 and detail explanation is included in the appendix (p 3).

Discussion

We used a large dataset of over 1.5 million under-five children in 58 LMICs from Africa, Latin America, Asia, and Europe (Albania)to identify the risk factors and predict the double and triple burden of infections among children. This study demonstrates the use of multilevel models to identify the risk factors of both double and triple-burden symptoms among under-five children and implement the ML approaches for predicting these outcomes across 58 LMICs over time. The study utilized the dataset from DHS, NASA and the globalization index from WHO reports to examine the link between the environment and the coexistence of symptoms of infections (i.e., fever, cough, and diarrhea) in a global context, including Africa, Asia, and Latin America. Results revealed a significant geographical variation in the occurrence of both double and triple burden of infections among under-five children with a substantial burden of childhood symptoms being present in Latin America. Moreover, the study highlighted a large variation in country-level and district prevalence and predicted values of these symptoms among under-five children. Previous literature revealed that the distribution of the prevalence of acute respiratory symptoms varies from country to country [8,9,10, 18] and from district to district within the same country [9, 12, 18, 51]. A child who resided in Latin America had a higher risk of both DB and TB compared to those who resided in Asia and Africa. This might be the reason that in Latin America, due to rapid urbanization and industrialization, many of the areas are affected by pollutant emissions [52,53,54,55].

A previous study comparing the prevalence of respiratory illnesses between the U.S. and Latin America showed that the Latin American region has a higher occurrence of respiratory conditions, such as asthma [56]. The predicted odds ratio and prevalence of cough, fever and diarrhea symptoms in Haiti, Bangladesh, Cambodia, Pakistan, the Democratic Republic of Congo, Malawi, and Uganda contribute significantly to the highest prevalence of both symptoms, this might be the reason that these countries are characterized as having relatively unstable political situations in the time of data collection. Political instability often leads to weakened infrastructure, healthcare systems, and vaccination programs, and reduces access to improved water and sanitation, which significantly increases the vulnerability to infection symptoms [57,58,59,60,61]

Children’s exposure to higher household smoking is positively associated with a high prevalence of both double and triple symptoms, which is in line with previous studies [62, 63]. This result also supports research [2, 6, 9, 12, 64] revealing that a higher risk of both double and triple burden of infections is significantly associated with household poverty.

In recent years, with the availability of large health-related data repositories and advances in computing power, classical statistical analysis is being combined with advanced machine learning algorithms to predict and classify the target variables [65,66,67]. As compared to other conventional statistical methods, the ML techniques revealed superior predictive capabilities in both outcomes among children. This is not a surprise result, as ML algorithms have been shown to outperform traditional statistical techniques in several fields [28, 37, 41, 68,69,70,71,72]. Moreover, the random forest (RF) outperformed better than the other algorithms in predicting both double and triple infections among children in LMICs. As also revealed in previous literature, the RF techniques have a better capability of predicting binary outcomes including mortality rates [28, 38, 73], acute respiratory infection [41], undernutrition status among children [37]. However, the performance of different ML techniques may vary depending on the nature of the dataset and the specific problem to be addressed and hence it is recommended to evaluate their performance based on the dataset we have in hand. Even though, there is literature on the risk factors of diarrhea, fever, and cough [1, 2, 6, 9, 10, 12], the prediction of both double and triple symptoms of infections using both multi-country and multi-sources datasets among under-five children across LMICs has not been explored.

Several mechanisms may be relevant for explaining the observed association between ambient PM2.5 and the outcome symptoms. Our results suggest that those children who are exposed to higher levels of PM2.5, namely levels above the WHO recommendation, have a higher risk of having DBIDs and TBIDs. The relationship between exposure to PM2.5 and fever is primarily mediated through the activation of the immune system, oxidative stress, and systemic inflammation. Exposure to ambient PM2.5 may increase the inflammatory response and the release of pro-inflammatory cytokines such as interleukin-1 (IL-1), IL-6, and C-reactive protein (CRP), which are key mediators in the development of fever [74,75,76] These cytokines may also act on the hypothalamus increasing the production of prostaglandin E2, a principal mediator of fever [74, 77, 78]. In addition to promoting inflammation, exposure to PM2.5 may induce oxidative stress, which further amplifies cytokine (IL-6 and TNF-α) production and inflammatory signaling pathways, and consequently contributing to fever [79, 80]. Exposure to PM2.5 has also been associated with coughing through both direct and indirect pathways involving inflammation, oxidative stress, and neural activation. Exposure to PM2.5 was associated with an upregulation of inflammatory pathways, such as the NF-kB signaling pathway, leading to increased production of proinflammatory cytokines (i.e., IL-1β, IL-6, and TNF-α), and consequently contributing to airway inflammation and cough [81]. PM2.5 may also increase the expression of transient receptor potential vanilloid-1 (TRPV1) in the airway epithelium, leading to increased cough symptoms [82,83,84]. Additionally, TRPV1 activation by PM2.5 may result in increased levels of substance P, a neuropeptide associated with inflammation and cough reflex sensitivity [83]. Similar to fever, the oxidative stress associated to PM2.5 may activate kinase cascades and transcription factors, releasing inflammatory mediators that contribute to airway inflammation and cough [85, 86]. Our results also highlight the association between exposure to PM2.5 and diarrhea, which may be supported by several interrelated mechanisms involving gut barrier dysfunction, inflammation, oxidative stress, and alterations in the gut microbiota. Inhaled PM2.5 can enter the gastrointestinal tract through mucociliary clearance or direct ingestion [87]. Once in the intestine, PM2.5 can increase epithelial permeability and impair the function of the intestinal immune barrier, leading to intestinal injury and inflammation that may contribute to gastrointestinal complications such as diarrhea [88, 89]. Additionally, PM2.5 exposure has been shown to induce intestinal inflammation by disrupting the balance of regulatory T cells (T reg) and T helper cells (Th17) and increasing the release of pro-inflammatory cytokines like IL-6, IL-1β, and TNF-α, which are crucial for maintaining intestinal immune homeostasis, increasing the risk of diarrhea [89,90,91,92]. Several studies have provide evidence on the adverse effects of exposure to PM2.5 and gut microbiome [92,93,94,95] According to scholars [90] showed a significant decrease of Lactobacillus acidophilus after exposure to PM2.5, which has been associated with an imbalance of Treg/Th17. The review conducted by [94]including 12 studies on humans, showed that exposure to air pollution exposure was positively associated with bacterial taxa belonging to Bacteroidetes, Deferribacterota, and Proteobacteria, and negatively associated with bacterial taxa belonging to Verrucomicrobiota, this observed gut dysbiosis may impair digestion and immune regulation, leading to gastrointestinal disturbances including diarrhea[96].

The study population comprised over 1.5 million children which provided accurate estimates of the prevalences of the studied health outcomes. The sampling techniques used ensure representativeness of the prevalence and measures of effect across wide geographical areas including 58 countries. The algorithms were rigorously tested on this comprehensive dataset and machine learning techniques were utilized to predict the double and triple coexistence of infections among children under five years. The potential predictors were incorporated from a wide range of data sources from various perspectives, including globalization index (political, social, and economic), NASA and DHS, to ensure a robust and multifaceted analysis.

The main limitation of the study is the cross-sectional design, which does not allow elaboration of temporality between hypothesized determinants and predictors and the health outcomes of interest. This limits any causal inference to be made from the results. Another limitation is that the study (survey) was conducted in different years, and the comparison made on prevalence by country may be bias because of the changes in determinants and health outcomes over time. Finally, information on the health outcomes (fever, cough, and diarrhea) and some hypothesized determinants and predictors of these outcomes were self-reported by mothers/caregivers, and as a result, maternal recall bias may have affected the validity of the studied determinant-outcome relations.

Conclusions

Our study used PM2.5 from NASA and the globalization index from WHO reports to assess the association between the environmental pollution and globalization index with both double and triple burden of infections among 1,546,243 under-five children sampled from the 2000 to 2023 DHS dataset in 58 LMICs. The study explores a full statistical analysis and machine learning algorithms of variables associated with both double and triple burden of infection symptoms among under-five children resided in LMICs, employing multilevel models and machine learning algorithms. The multilevel model revealed that the location of a child, survey years, wealth index of the households, residence, size of the household PM2.5, fuel use, sanitation, sources of drinking water, and level of smoking risk were some of the significant factors of both double and triple burden of infections.

The present study attempted to identify the best ML algorithms for the prediction of both double and triple symptoms of infections using nationwide cross-sectional data from 58 LMI countries. The analysis of different MLA for both double and triple symptoms of infection prediction has shown that the best-performing algorithms are RF and bagged trees. Their consistently outstanding performance was checked across multiple evaluation metrics with three different protocols and three different training–testing ratios leading to this conclusion. The future direction of our research would be to explore the longitudinal analysis that could track the temporal trends and progress at double and triple rates over time, offering insights into the child public health.

Data availability

The datasets used in this study are publicly available and can be accessed from portals. The DHS data are publicly available from https://dhsprogram.com after a formal request is accepted, while the PM2.5 and NO2 estimates are publicly available as version V4.GL.03 at https://sites.wustl.edu/acag/datasets. Moreover, the KOF Globalization Index (KOF Globalisation Index – KOF Swiss Economic Institute | ETH Zurich) and the World Bank (https://data.worldbank.org/). The SAS, STATA and R codes used in the study will be available from a formal request from the corresponding author.

Abbreviations

ARIs:

Acute respiratory infections

LMICs:

Low and middle-income countries

D/TB:

Double and triple burden

DHS:

Demographic and health surveys

MLA:

Machine learning algorithms

ICC:

Intraclass correlation

ACU:

Area under the curve

WHO:

World Health Organization

PM2.5
:

Particulate matter

NO2
:

Nitrogen dioxide

NASA:

National aeronautics and space administration

ROC:

Receiver operating characteristics

KR:

Kids record

SER:

Smoke exposure risks

GPS:

Global positioning system

PCA:

Principal component analysis

References

  1. Benzamin M, Hoque M. Use of ‘diarrhea stool card’in acute diarrhea management in under 5 years children in resource constraints country. Gastroenterol Endosc. 2024;2(2):96–101.

    CAS 

    Google Scholar
     

  2. Winskill P, et al. Health inequities and clustering of fever, acute respiratory infection, diarrhoea and wasting in children under five in low-and middle-income countries: a demographic and health surveys analysis. BMC Med. 2021;19:1–11.


    Google Scholar
     

  3. Organization, W.H., Children: reducing mortality; 2019

  4. Rudan I, et al. Global estimate of the incidence of clinical pneumonia among children under five years of age. Bull World Health Organ. 2004;82(12):895–903.

    PubMed 

    Google Scholar
     

  5. Anjum MU, Riaz H, Tayyab HM. Acute respiratory tract infections (Aris);: clinico-epidemiolocal profile in children of less than five years of age. Prof Med J. 2017;24(02):322–5.


    Google Scholar
     

  6. Li H, et al. 2-week prevalence and associated factors of fever, diarrhea, and coexisting fever and diarrhea among children aged 6–23 months in rural Hunan Province. Sci Rep. 2024;14(1):13867.

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  7. Organization, W.H., Pocket book of hospital care for children: guidelines for the management of common childhood illnesses. 2013: World Health Organization.

  8. Ujunwa F, Ezeonu C. Risk factors for acute respiratory tract infections in under-five children in Enugu Southeast Nigeria. Ann Med Health Sci Res. 2014;4(1):95–9.

    PubMed 
    PubMed Central 

    Google Scholar
     

  9. Sultana M, et al. Prevalence, determinants and health care-seeking behavior of childhood acute respiratory tract infections in Bangladesh. PLoS ONE. 2019;14(1):e0210433.

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  10. Kjærgaard J, et al. Diagnosis and treatment of acute respiratory illness in children under five in primary care in low-, middle-, and high-income countries: a descriptive FRESH AIR study. PLoS ONE. 2019;14(11):e0221389.

    PubMed 
    PubMed Central 

    Google Scholar
     

  11. Banda B, et al. Risk factors associated with acute respiratory infections among under-five children admitted to Arthur’s Children Hospital, Ndola Zambia. Asian Pac J Health Sci. 2016;3(3):153–9.


    Google Scholar
     

  12. Harerimana J-M, et al. Social, economic and environmental risk factors for acute lower respiratory infections among children under five years of age in Rwanda. Arch Public Health. 2016;74(1):1–7.


    Google Scholar
     

  13. Landrigan PJ, et al. The Lancet Commission on pollution and health. Lancet. 2018;391(10119):462–512.

    PubMed 

    Google Scholar
     

  14. Mirabelli MC, Ebelt S, Damon SA. Air quality index and air quality awareness among adults in the United States. Environ Res. 2020;183:109185.

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  15. Lelieveld J, et al. Loss of life expectancy from air pollution compared to other risk factors: a worldwide perspective. Cardiovasc Res. 2020;116(11):1910–7.

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  16. Hammer MS, et al. Global estimates and long-term trends of fine particulate matter concentrations (1998–2018). Environ Sci Technol. 2020;54(13):7879–90.

    CAS 
    PubMed 

    Google Scholar
     

  17. Kjærgaard J, et al. Diagnosis and treatment of acute respiratory illness in children under five in primary care in low-, middle-, and high-income countries: a descriptive FRESH AIR study. PLoS ONE. 2020;15(2):e0229680.

    PubMed 
    PubMed Central 

    Google Scholar
     

  18. Goodarzi E, et al. Epidemiology of mortality induced by acute respiratory infections in infants and children under the age of 5 years and its relationship with the Human Development Index in Asia: an updated ecological study. J Public Health. 2021;29(5):1047–54.


    Google Scholar
     

  19. Fetene MT, Fenta HM, Tesfaw LM. Spatial heterogeneities in acute lower respiratory infections prevalence and determinants across Ethiopian administrative zones. J Big Data. 2022;9(1):1–16.


    Google Scholar
     

  20. Gelman, A., Data analysis using regression and multilevel/hierarchical models. 2007: Cambridge university press.

  21. Quintero A, Lesaffre E. Comparing hierarchical models via the marginalized deviance information criterion. Stat Med. 2018;37(16):2440–54.

    PubMed 

    Google Scholar
     

  22. Weerahandi S, Ananda MM. Improving the EBLUPs of balanced mixed-effects models. Metrika. 2015;78(6):647–62.


    Google Scholar
     

  23. Robinson GK. That BLUP is a good thing: the estimation of random effects. Stat Sci. 1991;1:15–32.


    Google Scholar
     

  24. Goldstein H, Browne W, Rasbash J. Partitioning variation in multilevel models. Underst Stat. 2002;1(4):223–31.


    Google Scholar
     

  25. Merlo J, et al. A brief conceptual tutorial of multilevel analysis in social epidemiology: linking the statistical concept of clustering to the idea of contextual phenomenon. J Epidemiol Commun Health. 2005;59(6):443–9.


    Google Scholar
     

  26. Bunkhumpornpat, C., K. Sinapiromsaran, and C. Lursinsap. MUTE: Majority under-sampling technique. 2011 8th international conference on information, communications & signal processing. 2011. IEEE.

  27. Matsuoka D. Classification of imbalanced cloud image data using deep neural networks: performance improvement through a data science competition. Prog Earth Planet Sci. 2021;8:1–11.


    Google Scholar
     

  28. Maniruzzaman M, Shin J, Hasan MAM. Predicting children with ADHD using behavioral activity: a machine learning analysis. Appl Sci. 2022;12(5):2737.

    CAS 

    Google Scholar
     

  29. Hossain M, Mullally C, Asadullah MN. Alternatives to calorie-based indicators of food security: an application of machine learning methods. Food Policy. 2019;84:77–91.


    Google Scholar
     

  30. Quinlau R. Induction of decision trees. Mach Learn. 1986;1(1):S1–106.


    Google Scholar
     

  31. Gareth J, et al. An introduction to statistical learning: with applications in R. Cham: Spinger; 2013.


    Google Scholar
     

  32. Molina M, Garip F. Machine learning for sociology. Annu Rev Sociol. 2019. https://doi.org/10.1146/annurev-soc-073117-041106.

    Article 

    Google Scholar
     

  33. Géron, A., Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow: Concepts, tools, and techniques to build intelligent systems. 2019: O’Reilly Media.

  34. Marsland S. Machine learning: an algorithmic perspective. Boca Raton: CRC Press; 2015.


    Google Scholar
     

  35. Zhang, H., The Optimality of Naïve Bayes. FLAIRS2004 conference. 2004.

  36. Moulaei K, et al. Comparing machine learning algorithms for predicting COVID-19 mortality. BMC Med Inform Decis Mak. 2022;22(1):2.

    PubMed 
    PubMed Central 

    Google Scholar
     

  37. Fenta HM, Zewotir T, Muluneh EK. A machine learning classifier approach for identifying the determinants of under-five child undernutrition in Ethiopian administrative zones. BMC Med Inform Decis Mak. 2021;21:1–12.


    Google Scholar
     

  38. Samuel O, Zewotir T, North D. Application of machine learning methods for predicting under-five mortality: analysis of Nigerian demographic health survey 2018 dataset. BMC Med Inform Decis Mak. 2024;24(1):86.

    PubMed 
    PubMed Central 

    Google Scholar
     

  39. Talukder A, Ahammed B. Machine learning algorithms for predicting malnutrition among under-five children in Bangladesh. Nutrition. 2020;78:110861.

    PubMed 

    Google Scholar
     

  40. Bitew FH, Sparks CS, Nyarko SH. Machine learning algorithms for predicting undernutrition among under-five children in Ethiopia. Public Health Nutr. 2022;25(2):269–80.

    PubMed 

    Google Scholar
     

  41. Fenta HM, et al. Factors of acute respiratory infection among under-five children across sub-Saharan African countries using machine learning approaches. Sci Rep. 2024;14(1):15801.

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  42. Yu H-F, Huang F-L, Lin C-J. Dual coordinate descent methods for logistic regression and maximum entropy models. Mach Learn. 2011;85(1–2):41–75.


    Google Scholar
     

  43. Arthur EH, Robert WK. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics. 1970;12(1):55–67.


    Google Scholar
     

  44. Tibshirani R. Regression shrinkage and selection via the lasso. J Roy Stat Soc. 1996;58(1):267–88.


    Google Scholar
     

  45. Zou H, Hastie T. Addendum: regularization and variable selection via the elastic net. J Royal Stat Soc. 2005;67(5):768–768.


    Google Scholar
     

  46. Chen, T. guestrin, C. Xgboost: A scalable tree boosting system. Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (New York, NY, USA, 2016), KDD ‘16, ACM. 2016.

  47. McCallum, A. and K. Nigam. A comparison of event models for naive bayes text classification. AAAI-98 workshop on learning for text categorization. 1998. Madison, WI.

  48. Zhang D. Bayesian classification. In: Fundamentals of Image Data Mining. Springer; 2019. p. 161–78.


    Google Scholar
     

  49. James G, et al. An introduction to statistical learning. Cham: Springer; 2013.


    Google Scholar
     

  50. Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology. 1982;143(1):29–36.

    CAS 
    PubMed 

    Google Scholar
     

  51. Fenta SM, Fenta HM. Risk factors of child mortality in Ethiopia: application of multilevel two-part model. PLoS ONE. 2020;15(8):e0237640.

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  52. Maria, A., et al., Central America urbanization review: making cities work for Central America. 2017: World Bank Publications.

  53. Paho W. Pan American Health Organization. Geneva: World Health Organization; 2023.


    Google Scholar
     

  54. Organization, W.H., Air pollution and child health: prescribing clean air: summary. 2018.

  55. Rees, N., Clear the air for children: the impact of air pollution on children. 2016: UNICEF.

  56. Cooper P, et al. Asthma in Latin America: a public heath challenge and research opportunity. Allergy. 2009;64(1):5–17.

    CAS 
    PubMed 

    Google Scholar
     

  57. Health, W.C.o.S.D.o. and W.H. Organization, Closing the gap in a generation: health equity through action on the social determinants of health: Commission on Social Determinants of Health final report. 2008: World Health Organization.

  58. Richter LM, Lye SJ, Proulx K. Nurturing care for young children under conditions of fragility and conflict. New Dir Child Adolesc Dev. 2018;2018(159):13–26.

    PubMed 

    Google Scholar
     

  59. Coghlan B, et al. Mortality in the Democratic Republic of Congo: a nationwide survey. Lancet. 2006;367(9504):44–51.

    PubMed 

    Google Scholar
     

  60. Farmer P, et al. Meeting cholera’s challenge to Haiti and the world: a joint statement on cholera prevention and care. PLoS Negl Trop Dis. 2011;5(5):e1145.

    PubMed 
    PubMed Central 

    Google Scholar
     

  61. Doctor HV, Nkhana-Salimu S, Abdulsalam-Anibilowo M. Health facility delivery in sub-Saharan Africa: successes, challenges, and implications for the 2030 development agenda. BMC Public Health. 2018;18:1–12.


    Google Scholar
     

  62. Mehta S, et al. Ambient particulate air pollution and acute lower respiratory infections: a systematic review and implications for estimating the global burden of disease. Air Qual Atmos Health. 2013;6:69–83.

    CAS 
    PubMed 

    Google Scholar
     

  63. Singh K, Bloom S, Brodish P. Gender equality as a means to improve maternal and child health in Africa. Health Care Women Int. 2015;36(1):57–69.

    PubMed 

    Google Scholar
     

  64. Merlo J, et al. A brief conceptual tutorial on multilevel analysis in social epidemiology: investigating contextual phenomena in different groups of people. J Epidemiol Community Health. 2005;59(9):729–36.

    PubMed 
    PubMed Central 

    Google Scholar
     

  65. Zou H, Hastie T. Regularization and variable selection via the elastic net. J R Stat Soc Series B Stat Methodol. 2005;67(2):301–20.


    Google Scholar
     

  66. Géron, A., Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow. 2022: O’Reilly Media, Inc

  67. Abdelhafiz D, et al. Deep convolutional neural networks for mammography: advances, challenges and applications. BMC Bioinform. 2019;20:1–20.


    Google Scholar
     

  68. Breiman L. Random forests. Mach Learn. 2001;45:5–32.


    Google Scholar
     

  69. Genuer R, Poggi J-M, Tuleau-Malot C. Variable selection using random forests. Pattern Recogn Lett. 2010;31(14):2225–36.


    Google Scholar
     

  70. Janitza S, Tutz G, Boulesteix A-L. Random forest for ordinal responses: prediction and variable selection. Comput Stat Data Anal. 2016;96:57–73.


    Google Scholar
     

  71. Panch T, Szolovits P, Atun R. Artificial intelligence, machine learning and health systems. J Glob Health. 2018. https://doi.org/10.7189/jogh.08.020303.

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  72. Shahinfar S, et al. Machine learning approaches for the prediction of lameness in dairy cows. Animal. 2021;15(11):100391.

    CAS 
    PubMed 

    Google Scholar
     

  73. Tezza F, et al. Predicting in-hospital mortality of patients with COVID-19 using machine learning techniques. J Pers Med. 2021;11(5):343.

    PubMed 
    PubMed Central 

    Google Scholar
     

  74. Eskilsson A, et al. Immune-induced fever is mediated by IL-6 receptors on brain endothelial cells coupled to STAT3-dependent induction of brain endothelial prostaglandin synthesis. J Neurosci. 2014;34(48):15957–61.

    PubMed 
    PubMed Central 

    Google Scholar
     

  75. Yu P, et al. Microglia caspase11 non-canonical inflammasome drives fever. Acta Physiol. 2024;240(9):e14187.

    CAS 

    Google Scholar
     

  76. Miller GE, et al. Ambient PM2. 5 and specific sources increase inflammatory cytokine responses to stimulators and reduce sensitivity to inhibitors. Environ Res. 2024;252:118964.

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  77. Nilsberth C, et al. The role of interleukin-6 in lipopolysaccharide-induced fever by mechanisms independent of prostaglandin E2. Endocrinology. 2009;150(4):1850–60.

    CAS 
    PubMed 

    Google Scholar
     

  78. Blomqvist A. Inflammation-induced fever depends on prostaglandin E2 production by brain endothelial cells and EP3 receptors in the median preoptic nucleus of the hypothalamus. Acta Physiol. 2024. https://doi.org/10.1111/apha.14238.

    Article 

    Google Scholar
     

  79. Liu C-W, et al. PM 25-induced oxidative stress increases intercellular adhesion molecule-1 expression in lung epithelial cells through the IL-6/AKT/STAT3/NF-κB-dependent pathway. Part Fibre Toxicol. 2018;15:1–16.


    Google Scholar
     

  80. Zhang L, et al. PM2. 5 exposure upregulates pro-inflammatory protein expression in human microglial cells via oxidant stress and TLR4/NF-κB pathway. Ecotoxicol Environ Saf. 2024;277:116386.

    CAS 
    PubMed 

    Google Scholar
     

  81. Kim H, et al. CKD-497 inhibits NF-kB signaling and ameliorates inflammation and pulmonary fibrosis in ovalbumin-induced asthma and particulate matter-induced airway inflammatory diseases. Front Pharmacol. 2024;15:1428567.

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  82. Du Q, et al. The role of transient receptor potential vanilloid 1 in common diseases of the digestive tract and the cardiovascular and respiratory system. Front Physiol. 2019;10:1064.

    PubMed 
    PubMed Central 

    Google Scholar
     

  83. Lv H, et al. Effect of transient receptor potential vanilloid-1 on cough hypersensitivity induced by particulate matter 2.5. Life Sci. 2016;151:157–66.

    CAS 
    PubMed 

    Google Scholar
     

  84. Robinson RK, et al. Mechanistic link between diesel exhaust particles and respiratory reflexes. J Allerg Clin Immunol. 2018;141(3):1074–84.

    CAS 

    Google Scholar
     

  85. Ghio AJ, Carraway MS, Madden MC. Composition of air pollution particles and oxidative stress in cells, tissues, and living systems. J Toxicol Environ Health Part B. 2012;15(1):1–21.

    CAS 

    Google Scholar
     

  86. Liu K, Hua S, Song L. PM2. 5 exposure and asthma development: the key role of oxidative stress. Oxid Med Cell Longev. 2022;2022(1):3618806.

    PubMed 
    PubMed Central 

    Google Scholar
     

  87. Pambianchi E, Pecorelli A, Valacchi G. Gastrointestinal tissue as a “new” target of pollution exposure. IUBMB Life. 2022;74(1):62–73.

    CAS 
    PubMed 

    Google Scholar
     

  88. Lu D, et al. Correlation between particulate matter and intestinal microflora and intestinal inflammation: research progress. Chin J Microecol. 2021;33(3):356–60.


    Google Scholar
     

  89. Beamish LA, Osornio-Vargas AR, Wine E. Air pollution: an environmental factor contributing to intestinal disease. J Crohns Colitis. 2011;5(4):279–86.

    PubMed 

    Google Scholar
     

  90. Xu J, et al. L. acidophilus participates in intestinal inflammation induced by PM2. 5 through affecting the Treg/Th17 balance. Environ Pollut. 2024;341:122977.

    CAS 
    PubMed 

    Google Scholar
     

  91. Xie S, et al. Exposure to concentrated ambient PM2. 5 (CAPM) induces intestinal disturbance via inflammation and alternation of gut microbiome. Environ Int. 2022;161:107138.

    CAS 
    PubMed 

    Google Scholar
     

  92. Meyer F, et al. Cytokines and intestinal epithelial permeability: a systematic review. Autoimmun Rev. 2023;22(6):103331.

    CAS 
    PubMed 

    Google Scholar
     

  93. Salim SY, Kaplan GG, Madsen KL. Air pollution effects on the gut microbiota: a link between exposure and inflammatory disease. Gut microbes. 2014;5(2):215–9.

    PubMed 

    Google Scholar
     

  94. Van Pee T, et al. Ambient particulate air pollution and the intestinal microbiome; a systematic review of epidemiological, in vivo and in vitro studies. Sci Total Environ. 2023;878:162769.

    CAS 
    PubMed 

    Google Scholar
     

  95. Sommer AJ, et al. A randomization-based causal inference framework for uncovering environmental exposure effects on human gut microbiota. PLoS Comput Biol. 2022;18(5):e1010044.

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  96. Mutlu EA, et al. Inhalational exposure to particulate matter air pollution alters the composition of the gut microbiome. Environ Pollut. 2018;240:817–30.

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

Download references

Acknowledgements

The datasets used in this study were obtained from the DHS program, thanks to the authorization received to download the dataset on the website.

Funding

Open Access funding provided by University of Oulu (including Oulu University Hospital). The research leading to this publication was co-funded by the European Union’s Horizon 2020 Research and Innovation Programme under the Marie Skłodowska-Curie Actions grant agreement No. 101126602 (Data4Healthcare), and the University of Oulu.

Author information

HMF, JJ and KA set up the collaborative network and designed the study. They have also developed the statistical methods. HMF, JJ, KA, Aino and Ines took the lead in drafting the manuscript and interpreting the results. All authors provided the data and contributed to the interpretation of the results and the submitted version of the manuscript. HMF accessed and verified the data. All authors had full access to all the data in the study and had final responsibility for the decision to submit for publication.

Ethics declarations

We performed analyses using data from the DHS open to the public. The DHS complies with requirements for the protection of the privacy of respondents. Therefore, no further permission for the study was needed by us as the data was secondary and publicly accessible. Additional information about the data and ethical practices are provided at https://dhsprogram.com.

Not applicable.

We declare no competing interests.

Supplementary Information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article