Which are best for successful aging prediction? Bagging, boosting, or simple machine learning algorithms?

Background The worldwide society is currently facing an epidemiological shift due to the significant improvement in life expectancy and increase in the elderly population. This shift requires the public and scientific community to highlight successful aging (SA), as an indicator representing the quality of elderly people’s health. SA is a subjective, complex, and multidimensional concept; thus, its meaning or measuring is a difficult task. This study seeks to identify the most affecting factors on SA and fed them as input variables for constructing predictive models using machine learning (ML) algorithms. Methods Data from 1465 adults aged ≥ 60 years who were referred to health centers in Abadan city (Iran) between 2021 and 2022 were collected by interview. First, binary logistic regression (BLR) was used to identify the main factors influencing SA. Second, eight ML algorithms, including adaptive boosting (AdaBoost), bootstrap aggregating (Bagging), eXtreme Gradient Boosting (XG-Boost), random forest (RF), J-48, multilayered perceptron (MLP), Naïve Bayes (NB), and support vector machine (SVM), were trained to predict SA. Finally, their performance was evaluated using metrics derived from the confusion matrix to determine the best model. Results The experimental results showed that 44 factors had a meaningful relationship with SA as the output class. In total, the RF algorithm with sensitivity = 0.95 ± 0.01, specificity = 0.94 ± 0.01, accuracy = 0.94 ± 0.005, and F-score = 0.94 ± 0.003 yielded the best performance for predicting SA. Conclusions Compared to other selected ML methods, the effectiveness of the RF as a bagging algorithm in predicting SA was significantly better. Our developed prediction models can provide, gerontologists, geriatric nursing, healthcare administrators, and policymakers with a reliable and responsive tool to improve elderly outcomes.


Introduction
Aging is a global phenomenon that represents a significant risk factor for disability and many chronic diseases.This period of human life is a continuous but irreversible process with a steady deterioration in body structure and functions [1,2].Population aging will increase healthcare costs, resulting in a huge medical burden and severe financial pressure on families, which poses profound economic, political, and social outcomes for both developed and developing countries [3,4].The global proportion of older people aged ≥ 60 is increasing rapidly compared to other age groups [5].Currently, it is estimated that 12.7% of the world's population is elderly.By 2050, the elderly population is projected to make up more than 21.4% of the world's population, and by 2100 this population will triple to reach approximately 27.7% [6].Reports indicated that the population in Iran is in transitioning from youth to old age.About 10% of Iran's population is aged 60 years and older.According to official reports, people aged 65 and older will account for 31% of the total Iranian population by 2050, and this proportion will increase dramatically [7,8].
In recent decades, advances inmedicine have significantly reduced global mortality rates, leading to an increase in the world's elderly population.Aging is not a disease, but neglect of people's health monitoring has negative impacts on all countries' healthcare, the economy, education, employment, social, and political sectors.The negative effects of the increasing aging population include decreased quality of life (QoL), increased dependence on others for doing daily activities and mental health problems, existing problems such asloss of job, loss of spouse and friends, loss of children, poverty, and physical problems [9,10].On the other hand, improving life expectancy leads to an increase in the elderly population along with the amount of time spent as an older adult.In this situation, the epidemiology of diseases among the elderly also changes to chronic non-communicable diseases such as cardiovascular diseases (CVD), hypertension, diabetes, neoplasm, and dementia.As a result, it causes social and economic problems for the elderly, so the elderly population requires more health services than other age groups [11].
The concept of successful aging (SA) emerged in the gerontological literature to overcome the challenges and problems of population aging.SA as a preferred term overlaps with various terms such as positive aging, aging well, productive aging, and healthy aging [12,13].The SA stressed the quality of the aging life.This paradigm shifts the focus on aging from normal aging with four Ds (disease, disability, death, and dementia) to SA assesses how people can age well, and identifies the involved processes and components with criteria to "how long, " "how well, " or "how healthy" live [14][15][16].This concept has long intrigued academics and researchers.Robert Havighurs first defined SA in 1961 as feeling life satisfaction and happiness during the latter stages of an individual's lifespan [17].Rowe and Kahn [18] state that SA is not suffering from chronic diseases, but consists of a combination of three components, which are the low probability of disease and disease-related disability, high cognitive and physical functioning, and active engagement with life [18][19][20].However, Rowe and Kahn's theory ignored the dimension of mental health.In recent years, an increasing number of researchers have improved on the Rowe and Kahn model.For example, Crowther added "positive spirit" as a fourth dimension, and Bowling added "subjective well-being" [21].
Previous studies have mostly described factors influencing SA.However, due to the subjective, interdisciplinary, and multidimensional nature of SA, measuring or predicting is a difficult task.A fundamental emphasis of studies is on better understanding and defining SA and recognizing its determinants so that clinical care and protective interventions can be more meaningfully informed [22].The influencing factors of SA are interdependent and complex, and the traditional model does not apply to SA [11,23,24].Rapid technological and digital advancement, such as artificial intelligence (AI), provides new ways to create novel smart services or renew health pathways by lean operations [11,25,26].As a subcategory of AI, machine learning (ML) is an extensive discipline based on statistics or computational science that provides automated learning techniques to extract hidden patterns from empirical data and then make complex decisions based on learned behaviors [11,21,27].The present study aimed to develop several ML predictive models for predicting SA using important features that influence SA.Finally, the performance of the ML models was compared to select the best one.

Features extraction
After the literature review, an electronic checklist was prepared based on the 102 items extracted from the literature search.In the first phase of Delphi, 55 items were rejected and 15 items were qualified for the second Delphi phase.In the second phase, 13 items were accepted and 2 items were rejected by the experts' panel.At the end of the Delphi phase, 44 eligibility features have entered the final checklist to predict SA.

Sample characteristics
Finally, 1465 cases participated in this study for data analysis including the 746 and 719 associated with non-SA and SA classes, respectively.The 566 and 899 cases pertained to men and women, respectively, with an average age of 68.3 ± 3.325 years.

Multi-variable statistical analysis
The results of data analysis pertained to the SA and non-SA elderly cases using the BLR as multi-variables statistical analysis are presented in Table 1.
In this table, the odd ratio shows the probability of occurrence of each state of variables, the CI is 95% of the occurrence of the odd ratio, and the correlation is defined as the correlation of each variable with the output class.To obtain the best influencing factors for the SA, we considered the P < 0.05 for these variables.In contrast, the variables with P > 0.05 were excluded from this study.Based on the information given in Table 1, the determinant factors of age [CI = 1.52-1.94](β = 0.12), income level [CI = 2.12-

Appraising the ML algorithms' performance
The results of the evaluation metrics of ML algorithms including bagging, boosting, and simple algorithms with fivefold cross-validation are shown in Table 2.
Based on the evaluation metrics presented in Table 2, the RF model by the maximum tree depth of 6 and 50 of algorithm's iteration with sensitivity = 0.95 ± 0.01, specificity = 0.94 ± 0.01, accuracy = 0.94 ± 0.005, and F-score = 0.94 ± 0.003 gained the best predictive strength in classifying the SA and non-SA cases among older adults.The XG-Boost-trained algorithm with decision stump as a base classifier and gbtree as an objective function with sensitivity = 0.88 ± 0.01, specificity = 0.86 ± 0.02, accuracy = 0.88 ± 0.01, and F-score = 0.88 ± 0.01 was ranked as a second predictive performer in terms of SA compared to other ML-trained algorithms.Also the Ada-Boost with the decision stump as a base classifier and maximum iteration equaled to 20 with sensitivity = 0.88 ± 0.01, specificity = 0.86 ± 0.02, accuracy = 0.88 ± 0.01, and F-score = 0.86 ± 0.01 and bagging-trained algorithm with sensitivity = 0.84 ± 0.02, specificity = 0.84 ± 0.01, accuracy = 0.84 ± 0.01, and F-score = 0.84 ± 0.02 got the third and fourth predictive strength ranks among other ML-trained algorithms.Investigating the predictive strength of these four ensemble ML-trained algorithms using the mentioned performance indicators in this study showed that all of them obtained the pleasant capability in categorizing the SA and non-SA cases among the elderly with all performance criteria obtained more than 80%.The NB algorithm with sensitivity = 0.68 ± 0.04, specificity = 0.65 ± 0.05, accuracy = 0.69 ± 0.045, and F-score = 0.66 ± 0.04 obtained the lowest performance in this respect.With the exception of the NB algorithm, all other ML algorithms gained a performance of more than 0.7.However, the bagging and boosting algorithms gained more predictive strength in SA than other simple ML-trained algorithms.The results of comparing the algorithms based on the AUC curve in train, test and validation modes are shown in Fig 1.
By assessing and comparing the performance of all bagging, boosting, and base algorithms in all train, validation, and test situation, we resulted that the RF model as a bagging algorithm with AUC-train = 0.918, AUC-validation = 0.886, AUC-test = 0.845 gained the best predictive strength to classify the SA and non-SA cases among the older adults.The XG-Boost prediction model with AUC-train = 0.893, AUC-validation = 0.865, and AUC-test = 0.832 obtained the second predictive capability in classifying these cases as a boosting method.Also, the test results obtained by this algorithm showed the pleasant generalizability capability in classifying the SA and non-SA cases than RF model (we saw the less reduction of predictive power in test state result than the RF-trained ML by analyzing the ROC).Also, the AdaBoost and bagging algorithms with AUC-train = 0.836, AUC-validation = 0.765, AUC-test = 0.715 and AUC-train = 0.819, AUC-validation = 0.743, AUC-test = 0.703 gained the relative pleasant performance by AUC > 0.7 in all training, testing, and validation states.On the contrary, The J-48 and NB algorithms as the base algorithms with AUC-train = 0.623, AUC-validation = 0.558, and AUC-test = 0.531 and AUC-train = 0.569, AUC-validation = 0.526, and AUCtest = 0.512, respectively, gained the worst performance strength in this regard.In general, the evaluation of the functionality of all three types of bagging, boosting, and simple algorithms showed that the ROC values of the bagging and boosting were closer to the sensitivity vertices and so had the more favorable prediction strength for predicting the SA and non-SA cases among the elderly than the simple ML algorithms.

Overall schema indicating the performance and external testing prediction models
An overview of all data mining algorithms' performance results including bagging, boosting, and simple algorithms based on sensitivity, specificity, accuracy, F-score, and AUC-test is shown in Fig. 2.
Figure 2 shows that the RF, bagging, AdaBoost, and XG-Boost as the ensemble algorithms obtained better performance than the SVM, MLP, J-48, and NB as the simple algorithms to classify the SA and non-SA cases.The RF and XG-Boost obtained pleasant performance for classifying the SA and non-SA cases, but the RF as a bagging technique algorithm gained better performance than the two other boosting algorithms.In contrast, the NB-trained algorithm gained the worst performance in this respect.Evaluating the performance criteria considering the test state showed that RF and XG-Boost-trained algorithms with AUC-test gained the best generalizability capability than other ML-trained algorithms.Thus, these two ML-trained algorithms are more exposure to leveraging in external settings than others by pleasant performance demonstrated in test data.To evaluate the external validity of our best-trained ML algorithms we used these two models to test the predictive capability of them in predicting the external samples of SA and non-SA.We used the cases pertained to SA and non-SA cases belonged to one elderly center of the Abadan city.Also, the 45 and 70 cases associated with the SA and non-SA cases pertained to all older adults interviewed in this center were used for external evaluation.In this respect, we reported the external validity results using the confusion matrix and the ROC obtained by test data.The results of the classification of these external test cases using the confusion matrix are shown in Table 3.
Based on the information given in Table 3, the RF and XG-Boost models gained sensitivity = 0.84, specificity = 0.88, and accuracy = 0.86, and sensitivity = 0.82, specificity = 0.84, and accuracy = 0.83, respectively.In external state comparing to these performance criteria in internal validation, we did not obtain high reduction performance capability (average reduction < 10%) by these two algorithms.Also comparing the classification capability in test and train states confirms this subject (all ROC values pertained to RF, XG-Boost, and external test modes are close to each other) (Fig. 3).

Feature importance based on RF
Based on the RF algorithm, the features influencing the SA are described as their importance for prediction based on the Net Importance per percent (NI%) obtained with this algorithm.This result is shown in Fig. 4. Based on Fig. 4, the variables of age with NI = 92.9%,social functional with NI = 88.87%,social interpersonal relationship with NI = 93.83%,depression with NI = 84.97%,and hypertension with NI = 80.56% gained NI > 0.8 and were considered as the nest factors influencing the SA by the RF as the ensemble algorithm.The variable of income with the NI = 11.4% obtained the least amount in this regard.Based on the results, it is concluded that the social factors with a higher NI than other physical, demographic, and mental variables can be considered as important factors influencing SA.In other words, improving the modifiers of social factors has a potential role in increasing the SA in the elderly.

Discussion
The aim of this study was to predict SA using ML methods.For this purpose, data of persons aged 60 years and older were analyzed.For doing this, at first, the most relevant predictors related to SA were selected by using the BLR at P < 0.05.Then, eight well-known and commonly used algorithms such as AdaBoost, XG-Boost, Bagging, RF, J-48, MLP, SVM, and NB were trained.Finally, several evaluation metrics derived from the confusion matrix were calculated to validate the models.Our study applied some individual implementation, bagging, and boosting ML techniques to predict SA.In our study, the RF achieved the best performance as an ensemble and bagging algorithm.This algorithm can prove the strong performance of DTs in predicting SA.
To date, little research has been performed to classify SA using ML models.Kaur et al. assessed the performance of six ML algorithms to predict the national QoL and life satisfaction.In their study, the DT model showed the best performance with a root mean square error (RMSE) of 0.3.In addition, it is recognized that various factors such as income level, underlying condition, social support and engagement, housing condition, and access to services contribute highly to the prediction of SA [28].Lee et al. compared the performance of three common supervised ML algorithms for elderly health-related quality of life (HRQoL) with chronic diseases.Five factors with statistical significance were identified for HRQoL: monthly income, chronic disease diagnosis, depression, discomfort, and perceived health status.Finally, the DT algorithm yielded the best performance with an accuracy of 0.93 and an F-score of 0.49 Fig. 4 The NI of all selected variables affecting the SA [29].Their findings showed that ensemble techniques can improve the performance of models [32].Lee et al. compared the performance of single and ensemble ML models to predict depression in elderly people.The results showed that ensemble models increased modeling performance [33].Lin et al. also evaluated the prediction performance of the bagging ensemble ML method with other basic ML methods such as linear regression, SVM, multilayer feedforward neural networks, and RF to predict the functional outcomes of schizophrenia.Finally, the bagging ensemble algorithm outperformed the other techniques [34].Ahmadi and Asghari Varzaneh in separate studies [35,36] developed ML models for the prediction of SA.The comparison results of the experiments conducted in their studies show that the present study has evaluated a larger number of ML algorithms for predicting SA in older adults.The results of the current study showed that the use of a larger number of algorithms can lead to higher accuracy and better predictive power.However, it is important to note that the study populations, features, and predictors used in the three studies were different, which may have influenced the results.Nonetheless, our results suggest that a more comprehensive approach to SA prediction can provide valuable insights into the factors contributing to SA and improve outcomes for older adults.
Although the current study presented an optimum performance in predicting SA in older adults, it had several potential limitations and challenges.We only applied eight ML techniques on a small dataset of elderly individuals and did not use complex deep learning (DL) models due to their high data requirements.DL methods can learn complex representations of data but may overfit with small datasets due to a large number of parameters and sensitivity of optimization algorithms to available data.ML methods may be more suitable for small datasets.Although DL methods can achieve high performance, they may not be appropriate for small dataset classification.However, the accuracy and generalizability of our models will be enhanced if we test other ML techniques, as well as DL models at the larger, multicenter, and prospective dataset containing timevarying covariates to identify a more insightful set of longitudinal factors related to SA.In addition, the external validation method should be used to prove the results of the present study.Another posible limitation of this research is that it does not explain how the predictor and outcome variables are related causally.This causal relationship is not the main purpose of this research, but it is certainly suggested in future research to determine a set of longitudinal features related to SA.
In this study, ML models were developed and evaluated for predicting SA in older adults.These models have the potential to provide valuable tools for improving elderly outcomes and increasing the probability of SA.However, their practical implementation must be carefully considered, and further research is needed to validate and refine the models in different populations and settings.The potential benefits of using these models in clinical practice and policymaking are significant.They can assist geriatricians, senior nurses, healthcare administrators, and policymakers in providing optimal supportive services and customized therapeutic care for elderly persons.Additionally, the models can be used in combination with other tools and interventions to improve outcomes for older adults.However, the limitations of the models must also be acknowledged, and ethical and privacy concerns related to their use must be addressed.In future research, the models developed in our study could be applied and customized to other social problems.This could lead to a better understanding of the factors contributing to SA and help improve health outcomes and QoL for older adults.Overall, our study provides a valuable contribution to the field of SA prediction using ML, and we hope that these models will be used to benefit older adults in the real-world.

Conclusions
The main idea of this study is to evaluate several ML models to predict SA.This study can assist geriatricians and senior nurses in providing optimal supportive services and customized therapeutic care for elderly persons by analyzing their physical, psychological, and particularly social features and extracting the best evidence from the data.Our models also have the potential to provide healthcare administrators and policymakers with a reliable and responsive tool to improve elderly outcomes.These predictive models may also provide an advantage in increasing the probability of SA.In future research, our models are expected to be applied and customized to other social problems.

Study design and setting
This research is a cross-sectional study that was performed in 2022.We included the data of 1465 elderly people who referred to healthy settings in Abadan City Iran.In our study, aged 60 years and older are considered the elderly.Developed countries consider the age of 65 as the onset of old age.But the United Nations and the World Health Organization (WHO) recognize 60 years and older as elderly [37,38].

Study roadmap
This study included three phases: 1-dataset preprocessing, 2-model development, and 3-evaluating the algorithms' performance.The roadmap of this study is depicted in Fig. 5.

Data preparation
The SA variables are classified into socio-demographic, biomedical, and psychosocial classes.Data preparation is performed as follows:

Primary features selection
SA is a multidimensional concept, so finding predictive factors of SA is difficult.Therefore, a comprehensive literature review was performed to extract the potential features related to SA.The primary feature set prepared in the form of a checklist and then the most important features were selected by the Delphi study.

The panel of experts in the Delphi phase
A panel of experts, including 20 people, was contracted according to the following criteria: (1) should have knowledge related to older adults' health; (2) have more than 5 years of experience and/or scientific publications; (3) participants must consent to participate in this study and return the checklist.First, the purpose of this study was sent to the experts through emails, and informed consent for participation was received from them.Then, the electronic checklist was emailed to them.The experts' panel included 13 gerontologists, two geriatrics nursing, two health information management specialists, and three epidemiologists.About 52% of the participants of the Delphi stage were females, the mean of their work experience and the mean of their age were 18 ± 3.2 SD and 45.6 ± 6.4 SD, respectively.

Socio-demographic variables:
This class includes variables such as age, gender, educational level, marital status, occupation, income level, and insurance status.
Biomedical variables: This class was about physiological function, cognitive function, health, and the ability to do activities of daily living (ADLs).These variables are comorbidity diseases (hypertension, cardiovascular accidents (CVA), osteopathic, eye disease, renal disease, liver disease, muscle disease, diabetes, cancer, convalescences, and other diseases), physical activity (sports activities, exercise time, type of exercise), sexual health (sexual health assessment), general health, pain assessment, Fig. 5 The study roadmap describing the study fatigue, physical dysfunction, physical function, physical activity and exercise, assessment nutritional status, assessment mal-nutritional status, perform disease prevention activities, mental disorder, and physiological disorder.
Psychosocial variables: This class was actively engaged in life and well adapted to life including life satisfaction, tension management, self-efficacy, self-esteem, hope, futurity, social and interpersonal relationships, satisfaction with social support, and social functions.

Some variables were defined as follows:
Ability to perform activities of daily living (ADLs): This variable is measured by the Barthel Index, which has 10 questions to measure physical functioning.Barthel Index determines one's ability to perform basic ADLs, e.g., dressing, on a scale ranging from 0 to 100.Scores of 0-20 indicate severe dependence, 20-60 complete dependence, 61-90 moderate dependence, 91-99 partial dependence, and 100 indicate complete independence [33].In this study, an independent person is someone who has a score of 100 based on the Barthel index.
Life satisfaction: This variable was measured by the life satisfaction scale developed by Diener et al. [39].This scale consisted of 5 items measuring the cognitive component of well-being.Each statement has seven options and is scored from 1 to 7 (strongly disagree to agree strongly).The validity of this instrument was confirmed by Bayani et al. [34].In this study, a person who is satisfied with life receives a score of > 20 on this scale.
QoL: The 36-Item short-form survey (SF-36) was administered to measure this variable.This self-report questionnaire consists of 36 items and eight domains: physical function, social function, physical role-playing, emotional role-playing, mental health, evaluations of vitality, physical pain, and general health.In addition to these sections, SF36 also provides two general measures of physical health [total physical component score (PCS)] and mental and social health [total mental component score (MCS)].The respondents' scores in each domain vary from 0 to 100, and a higher score means a better QoL.The validity and reliability of this questionnaire have been confirmed in the Iranian population [35][36][37].
Physical activity, social, and interpersonal relationships: These factors are the SF-36 sub-categories evaluated in the elderly.In addition, the overall score was calculated to measure the QoL of the elderly.In this study, a score of 70 was considered the cut-off point for this variable.
Healthy lifestyle: Lifestyle determination generally depends on the total score obtained and is calculated by getting a score of 42-98 indicating an unfavorable, 99-155 showing a medium, and 156-216 denoting a desirable lifestyle.It measures physical activity, exercise, recreation, healthy eating, stress management, and social and interpersonal relationships [38].
Nutrition status: The Mini Nutritional Assessment questionnaire was administered to measure the healthy nutritional status of the elderly.In this questionnaire, a score of 12 or greater indicates that the person is well nourished and needs no further intervention.A score of 8-11 shows that the person is at risk of malnutrition.A score of 7 or less demonstrates that the person is malnourished [40].The cut-off point of this variable in our study is 12.
Hope: This factor was measured with the Hearth Hope Index tool.This tool has three characteristics of hope, including temporality and future, positive readiness and expectancy, and interconnectedness.This tool has 30 items and each item is scored between 0 and 3. A score of 3 indicates that the item applies a score of 0 indicates that the statement never applies to the respondent.Total scores can range from 0 to 90; higher scores indicate greater hope [42].
Self-efficacy: Self-efficacy means the effectiveness and ability of a specific performance.The general self-efficacy (GSE) scale measured this factor.This tool has 10 items.For the GSE, the total score ranges between 10 and 40, with a higher score indicating more self-efficacy [43].
Self-esteem: This factor was measured with the Rosenberg Self-Esteem Scale.This tool has 10 items and each item is scored from 1 to 4. A score of 1 indicates Strongly Disagree and a score of 4 means Strongly Agree [44,45].
Outcome variable (SA): The outcome variable was categorized into SA (coded 1) or non-SA (coded 0) classes.SA can be operationally defined as the ability of individuals to maintain physical, cognitive, and social functioning as they age, while avoiding disease and disability.This can be measured using a variety of indicators, such as physical performance tests, cognitive assessments, and self-reported measures of well-being.to be considered aging successfully, individuals should score well on these indicators and demonstrate a high level of functioning across multiple domains.Importantly, SA is a multidimensional concept that encompasses physical, cognitive, and social domains and is not simply a matter of avoiding disease or disability.One common model of SA is the "three-component model" proposed by Rowe and Kahn.In our study, SA was determined based on Raw and Khan's model which has three principal components: "absence of disease and disease-related disability, " "maintenance of high mental and physical function, " and "continued engagement with life" [40].According to this model, in our study the following inclusion criteria of SA were used: (1) absence of disease-related disability (the criteria met in this domain are being satisfied when adults have no disability and the number of chronic diseases ≤ 2 and a score below the median on the WHODAS-II), (2) maintenance of high mental and physical function (in this domain, the participants in our study had a Mini-Mental State Examination for Dementia Screening (MMSE-DS), a score of normal and a Bartle index = 100, and no presence of depression in the previous 12 months), and (3) "continued engagement with life" (in our study, life engagement is measured using Utrecht General Engagement Scale (UGES) and participants had engaged in three or more different social or religious activities at least once a month) [24,[41][42][43][44].All predictor variables are shown in Table 4. reviewed with each participant in a private room at the health center.Each interview took approximately 25 min, see the frequency of elderly participants in Table 4.

Feature selection
The feature selection method was used to reduce the dataset dimension and augment the data mining performance in the third step.Feature selection in a high-dimensional dataset is one of the most important data mining steps, eliminating redundancy and irrelevant features.Feature selection is the use of statistical methods for reducing the dataset dimension.Concisely, some advantages of this process can be addressed as improving the mining performance, preventing overfitting the algorithms, increasing the computational capability, speeding up the data mining process, and increasing the understandability [47][48][49][50][51].In this study, to gain the most critical factors affecting SA in the elderly, we used the binary logistics regression (BLR) as a multi-variable method to get the most important factors influencing SA.Also, the P < 0.05 was considered the statistically significant level in this regard.

Model implementation
We trained eight ML algorithms using three learning methods classifications including bagging [random forest (RF) and bootstrap aggregating (Bagging)], bossing [adaptive boosting (AdaBoost) and eXtreme Gradient Boosting (XG-Boost)], and simple techniques (J-48, multilayered perceptron (MLP), Naïve Bayes (NB), and support vector machine (SVM)) in Waikato environment for knowledge analysis (WEKA) and Python programing language.In this step, the data mining process was performed using the selected algorithms because of primarily used in recent research with high-performance capability.The reason for using these algorithms is to explore the strengths and limitations of each approach and to gain insights into the factors that are most important for predicting SA.Since SA is a complex and multifaceted concept, it requires a multidimensional approach that can capture the diverse range of factors that contribute to it.By using a variety of algorithms, researchers can better understand which features and models are most effective for predicting SA and can create a more accurate and robust model.Additionally, using multiple algorithms can help reduce the risk of overfitting and increase the generalizability of the model to new data.Overall, using multiple ML algorithms can provide a more comprehensive and insightful analysis of the factors that contribute to SA.The selected algorithms in our study were described as follows: RF: As an ensemble technique, RF is a bootstrap bagging technique aggregating several decision tree algorithms to enhance the algorithm's performance.The feature with the lowest Gini Index (Eq.2) is considered to select the best feature for data splitting: This algorithm has the strategy of voting sub-algorithms for calculating the performance.Indeed, the algorithm's capability is the performance of most similar trees in voting in the forest.The RF algorithms are suitable for high-dimensional datasets with numerous data samples.RF is an averaging method for reducing variance using (2) Gini Index deep decision trees from different training data parts.Usually, this method slightly increases bias and a slight loss of interpretability, but it will generally significantly improve the model's performance.In this decision tree type, the splitting process will occur using the input variables in the sub-dataset.The most prominent of this algorithm can be mentioned as a good prediction model for predicting missing data, common for working with imbalanced data for error reduction, and the importance of variables in the classification [52][53][54].
Bagging: Bagging is another ensemble ML algorithm using the bootstrap aggregating method during the training process.It is designed to promote the capability of the algorithms used to classify and predict cases.This algorithm uses the decision tree or other algorithms such as artificial neural networks (ANNs) or logistics regression.In the bootstrapping method (Fig. 6), the various algorithms are trained using the samples obtained by sampling with replacement.Based on the voting method, this algorithm considers the capability of classification capability pertained to most developed algorithms.One celebrated specification of this algorithm can be cited as reducing the variance and so the minimum probability of overfitting during the training process [55][56][57].
AdaBoost: the AdaBoost algorithm uses weak algorithms to predict the output class.The idea of boosting is to enhance the poorer data mining algorithms' performance by combining them in one algorithm as a boost.This algorithm achieves the votes from the various classifier regarding the performance capability in dataset classification for better performance.So, this can provide a high computational capacity in classifying the classes.Some advantages of this algorithm can be noted as powerful acceptance in categorizing samples without any predefined knowledge existing in data, refraining from classifying the samples organized hard, and minimizing the bias and variance using the repeatable and coherent essence [58][59][60].
XG-Boost: The XG-Boost acts as a classifier and regressor in data mining.This pleasant boosting algorithm performs the prediction model using several boosted decision trees in a parallel way by gradient descent method.The enhancements of the values on the objective function (Eq. 3) are considered necessary during training the algorithm and building the boosted trees: Fig. 6 The bootstrapping method in bagging techniques In Eq. 3, L is equivalent to the loss function during training the algorithm to assess the XG-Boost performance when training and are equivalent to the regularization parameter to evaluate the algorithm's functionality and overfitting situation.The f(j) is prediction pertained to the jth number of trees [61][62][63].The Hessian (Eq.4) and gradient descent functions are used to build the algorithm: In Eq. 4, g(x) = g m−1 (x), and L equals the loss function.
In Eq. 5, M and λ are equal to regularization parameters during the training of the algorithm.
The gain value associated with the root node is calculated as Eq. 6.
So, the output of the algorithm is calculated as follows:

SVM:
The SVM algorithm as a classifier and regressor algorithm is applied in ML science.When classifying the data instances, this algorithm uses the hyperplane concept for discriminating the different data on different class labels.This algorithm uses mathematical tricks to classify the other classes by increasing the dataset dimension to higher ones in a pleasant way.Depending on the complexity of the data, the SVM used various Kernel functions, namely, linear, polynomial, radial basis function (RBF), and so on.The RBF (Eq.8), one of the SVM techniques, is recognized as least square (LS)-SVM having the speed and efficiency in the computation process due to confronting the linear equations [64][65][66].
In Eq. 8, |x − x′| is the square of the Euclidean distance between two input class features, and σ is the regularization parameter on training the algorithm.
MLP: This feedforward configuration and backpropagation training mode of an Artificial Neural Network (ANN) has many applications in different fields.This ANN type consists of the input, hidden, and output layers.The input layer is responsible for gaining information from the external environment and converting the signals, data, or other input types to the specified calculation formula.The number of nodes in this layer is (5) Similarity Score = (total residuals) 2 (M + ) .
(6) Gain = right similarity + left similarity − root similarity.( equal to the study inputs.The second layer is for calculation, so most of the calculation process occurs in this layer.The output layer produces the calculation results and provides us with the ANN's prediction results.Also, this algorithm uses the activation function for data transformation [67][68][69][70].J-48: The J-48 decision tree algorithm, as a newer version of ID3, provided more capability with high flexibility.This decision tree type uses the concept of entropy (Eq.9) used to split the tree; in other words, the attribute having highly different entropy and the capability of discriminating the various classes from others is considered the node for breaking the tree.By viewing the x as an attribute, p and j are equal to the element and its position; the entropy can be evaluated below: The amount of entropy means the random status of the attribute; in other words, when entropy increases, the random degree is augmented, and decreasing the entropy pertains to the less occasionally, which is suitable for splitting [71][72][73][74].
NB: The NB (Eq.10) as a probabilistic algorithm is a commonly used supervised ML algorithm for its high performance.The logic of this algorithm type is that each input variable can independently predict the output class occurrence; in other words, the relationships between the input variables are independently compared to the LR, in which the combinational relationships are considered for forecasting the output class.This algorithm can be considered a simple ML algorithm with high accuracy because of its dependent nature.Some advantages of this algorithm are simplicity in classifying the samples, best classification in the independent mode of variables, and high performance concerning classified inputs [75][76][77][78].
In Formula 10, P(C k |x) is the probability of C k occurrence when having the x features with specific values.PC k is the occurrence of the C k class, and P(x|C k ) is the probability of the x when the class is determined as C k .

K-fold cross-validation
In ML, we typically need to split our available data into two sets: a training set and a test set.K-fold cross-validation is a technique used to evaluate the performance of an ML model.It involves splitting the available data into k equally sized subsets, or "folds." We then train and evaluate our model k times, each time using a different fold as the test set and the remaining folds as the training set.This allows us to get a more reliable estimate of the model's performance, as we are testing on a different subset of the data each time.To perform k-fold cross-validation, we first randomly shuffle the data and then split it into k equally sized subsets.We then train our model k times, each time using a different fold as the test set and the remaining (k − 1) folds as the training set.After each training iteration, we evaluate the model's performance on the test set and record the performance metric (such as accuracy or mean squared error).Finally, we compute the average performance across all k folds to get an estimate of the model's overall performance.The value of K in this research was considered equal to 5.

Evaluation of the performance of ML algorithms
In this step, we evaluated and compared the selected ML algorithms using the confusion matrix (Table 5) and calculated different performance criteria, including sensitivity, specificity, accuracy, and F-score to get the most common algorithm for determining SA.In Table 5, TP and TN are the successful and unsuccessful cases correctly classified by the algorithm, while FN and FP are successful and unsuccessful cases incorrectly classified by the model.Based on the confusion matrix, we calculated the sensitivity (Eq.11), specificity (Eq.12), accuracy (Eq.13), and F-Score (Eq.14) of all ML algorithms.Also, the AUC-ROC diagram of all algorithms was drawn and compared.The k-fold cross-validation (k=10) was considered for measuring errors during the training process.Finally, the most common data mining algorithm for determining the SA was obtained.

Fig. 1 Fig. 2
Fig.1The ROC of all statutes of ML algorithms

Fig. 3
Fig.3The ROC of internal and external validation

Table 1
Results of correlation of factors affecting SA [31]her study byAbdullah et al., presented a model for identifying QoL predictors based on the RF model.In this study, some variables such as lifestyle, exercise, social interaction, healthcare accessibility, chronic morbidity, and income wereproposed as the most effective predictors of QoL[21].Sim et al. designed an intelligent clinical decision support system (CDSS) based on ML algorithms to predict HRQoL.Finally, the RF algorithm yielded the best performance with an AUC-ROC of 0.898[30].Cai et al. evaluated the performance of selected ML algorithms using a dataset including 3657 community-dwelling adults aged ≥ 60 years to predict SA.Finally, the DT model with an AUC of 0.90% was introduced as the most appropriate algorithm, and age, arm curl, 30-s sit-to-stand, and reaction time were introduced as important predictors in all models[11].Paul et al. trained ensemble ML techniques to recognize ADLs in elderly people with HIV.After execution, the XG-Boost method obtained an average AUC of 83%[31].Zhou et al. trained some ML techniques such as DT, XG-Boost, Ada boosting, bagging, and RF to classify the healthy behaviors of the elderly.