In this study, we introduced a novel HLI to account for the magnitude of the relationships between individual lifestyle components and specific disease outcomes, using data-driven weights. The standard and the outcome-specific versions of the HLI were extensively compared by estimating the HR, the C-index and the PAF in a range of scenarios, involving the risk of cancer, T2D, CVD and premature mortality. Two strategies to operationalize the HLI were also investigated, involving in turn, binary indicators or categorical scores for the five components.
In our study, the discriminatory power was consistently larger for models based on the outcome-specific HLI than the standard HLI, sometimes to a large extent as in the case of T2D. The reason for this limitation of the standard HLI was clearly illustrated in Fig. 2, when considering binary indicators. As the standard HLI assumes that all lifestyle components are equally associated with the risk of disease, different lifestyle patterns with the same number of unhealthy components necessarily lead to the same predicted disease hazard rate in analyses based on the standard HLI. Conversely, our analyses utilizing the outcome-specific HLI reflected the disease hazard rate heterogeneity across these lifestyle patterns with the same number of unhealthy components. This limitation of the standard HLI in terms of discriminatory power highlights that it might be a suboptimal analytical choice for risk stratification and/or risk prediction36, especially in situations where a given lifestyle component is strongly linked to the outcome under consideration, such as BMI in the case of T2D.
Most previous studies on the HLI used the standard HLI to address etiological questions, specifically to estimate disease-specific HRs to quantify the impact of adhering to healthy lifestyle habits, and disease-specific PAFs to measure the public health burden attributable to unhealthy lifestyles8,18,19,22,24. In our study, we observed consistently weaker HR estimates for the standard HLI than the outcome-specific HLI, sometimes to a large extent, as for T2D. These results suggest that analyses utilizing outcome-specific HLIs are more likely to detect associations, particularly for diseases weakly associated with lifestyle habits. Conversely, PAF estimates were consistently larger when using the standard HLI. Estimating weaker HRs and larger PAFs with the standard HLI than the outcome-specific HLI may seem paradoxical, however our results from the theoretical study of linear causal models and the inspection of the empirical distributions of the standard and the outcome-specific HLIs displayed in Figure S1 might help clarify this apparent paradox. According to the binary version of standard HLI, 59% of the EPIC study population had a standard HLI lower or equal to 3 units, i.e., more than 2 standard-deviations below the maximum HLI of 5 units. As a result, the health benefits for this large proportion of participants, had they adhered to the healthiest possible lifestyle, led to large PAF estimates. On the other hand, according to the, say, death-specific HLI, 65% of the study population had an HLI value within one standard-deviation of the maximum HLI. As a result, the benefit in premature mortality had they adhered to the healthiest possible lifestyle was less remarkable, thus explaining the lower PAF estimates. In essence, analyses of the outcome-specific HLI mimics closely an analytical strategy where individual lifestyle components are evaluated jointly within the same model, and therefore yield similar PAF estimates. Thus, our results highlight that analyses based on standard HLI could lead to biased assessments of the public health burden attributable to unhealthy lifestyle. As mentioned in our theoretical study of linear causal models, it could be argued that utilizing standard HLIs might produce approximately valid estimates of PAFs of a latent variable, e.g., reflecting health-consciousness. Yet, the validity of this approach, particularly whether the standard HLI is a better proxy than weighted HLIs for this latent variable, would need further assessment.
The etiology of chronic diseases is complex, and some level of simplification via summary quantities is welcome in epidemiological research. To paraphrase Box’s aphorism, “all summarizations are wrong but some are useful”37. To be useful, a summarization should produce approximately valid results. The validity of results in analyses based on the standard HLI could be assessed by comparing them to results of an outcome-specific HLI or the individual lifestyle components. If results are similar, the standard HLI could be appropriate as it does not rely on data-driven weights and it could facilitate the comparison of findings across studies and across health-related outcomes. However, the premise that standard HLIs would facilitate comparison across studies might be tempered in view of the myriad of versions of standard HLIs proposed in the literature5,6,9,10,17,18,38.
Multiple lifestyle factors influence an individual’s health, but some are more critical than others, which should be reflected in public health recommendations. Towards this aim, the “healthiest” lifestyle profiles could be defined as the combinations of individual lifestyle behaviors associated with lowest risk of disease, longest life expectancy, or longest life expectancy free of a chronic disease9. The development and validation of an HLI using weights derived from meta-analyzed associations with disease risk, mortality or a composite outcome reflecting mortality and common chronic diseases would help the characterization and promotion of these healthiest profiles.
In line with previous versions of the standard HLI8, the HLI considered in this study was based on five individual components: smoking habits, alcohol intake, diet, physical activity, and adiposity. Refined statistical methods, e.g. using splines, could be used to combine the individual components into an outcome-specific HLI. Also, working with a refined categorization of these five components, including more descriptors, such as more refined information on smoking intensity or adiposity or a broader spectrum of dietary exposures, and/or including information on other lifestyle factors, such as quality of sleep39,40 or stress, might lead to more accurate assessments of the relationship between lifestyle and health-related outcomes. The evaluation conducted in this study relied on the EPIC cohort, where the study populations in the various countries were generally more health-conscious than their source populations. We could not account for other major chronic diseases that could affect observed associations with our outcomes of interest because of a lack of such information in EPIC. For example, chronic obstructive pulmonary disease (COPD) frequently co-occurs with CVD and share tobacco smoking as a main risk factor41. These potential limitations were acknowledged, yet they were unlikely to affect the main conclusions of the study, which were corroborated by the evidence of our theoretical results under simple linear causal models.