Development and validation of a Brief Diet Quality Assessment Tool in the French-speaking adults from Quebec

Background The objective of this study was to develop and validate a short, self-administered questionnaire to assess diet quality in clinical settings, using the Alternative Healthy Eating Index (AHEI) as reference. Methods A total of 1040 men and women (aged 44.6 ± 14.4 y) completed a validated web-based food frequency questionnaire (webFFQ) and had their height and weight measured (development sample). Participants were categorized arbitrarily according to diet quality (high: AHEI score ≥ 65/110, low: AHEI score < 65/110) based on dietary intake data from the webFFQ. The Brief Diet Quality Assessment Tool was developed using a classification and regression tree (CART) approach and individual answers to the webFFQ among participants considered to have a plausible energy intake (ratio of reported energy intake to basal metabolic rate ≥ 1.2 and < 2.4; n = 1040). A second sample of 3344 older adults (aged 66.5 ± 6.4 y) was used to test the external validity of the Brief Diet Quality Assessment Tool (external validation sample). Results The decision tree included sequences of 3 to 6 binary questions, yielding 21 different pathways classifying diet quality as being high or low. In the development sample, the area under the receiver operating characteristic (ROC) curve of the predictive model was 0.92, with sensitivity, specificity and agreement values of 89.5, 83.9 and 87.2%. Compared with individuals having a low-quality diet according to the Brief Diet Quality Assessment Tool (mean AHEI 56.7 ± 11.4), individuals classified as having a high-quality diet (mean AHEI 71.3 ± 11.0) were significantly older, and had lower BMI, percent body fat and waist circumference, and had lower blood pressure, triglycerides, cholesterol/HDL ratio and fasting insulin as well as higher HDL-cholesterol concentrations (all P < 0.05). Similar results were observed in the external validation sample, although overall performance of the Brief Diet Quality Assessment Tool was slightly lower than in the development sample, with an area under the ROC curve of 0.79 and sensitivity, specificity and agreement values of 73.0, 69.0 and 71.3%, respectively. Conclusion The CART approach yielded a simple and rapid Brief Diet Quality Assessment Tool that identifies individuals at risk of having a low-quality diet. Further studies are needed to test the performance of this tool in primary care settings. Electronic supplementary material The online version of this article (10.1186/s12966-019-0821-6) contains supplementary material, which is available to authorized users.


Background
One of the cornerstones of chronic disease prevention is to persuade the population to adhere to dietary guidelines [1]. For years, clinical guidelines have been largely focused on the concept of primary prevention, which aims to alleviate the impact of risk factors on chronic diseases. More recently, the notion of primordial prevention has emerged as a potentially more efficient public health strategy. Primordial prevention, for which optimizing diet is key, aims to avoid the development of risk factors in the first place [2]. However, physicians rarely inform their patients about the importance of healthy eating. In a Canadian study [3], family practitioners reported discussing diet with only 32% of their patients with type 2 diabetes and with less than 10% of their non-diabetic patients. One of the major challenges to implementing dietary counseling in a primary care setting is the lack of valid tools that assess diet quality, rapidly and accurately. In that regard, assessing global diet quality rather than relying on a few single nutrients of concern such as sodium and sugar is essential. A comprehensive approach to assessing diet quality, which takes into consideration food choices as well as interactions among foods and nutrients is more promising. Several complex dietary scores based on mathematical algorithms have been developed to describe the quality of the diet. The Alternative Healthy Eating Index (AHEI), which has been revised over the years to reflect current scientific literature, is well established [4]. It is based on extensive research on the association between foods and chronic disease risk [4,5]. However, as with many other diet quality scoring systems [6][7][8], computing the AHEI score requires in-depth data collection and analyses of food and nutrient intakes, which is very difficult in clinical settings.
Food frequency questionnaires, which survey a list of foods and beverages consumed over a specific period, hence providing information on habitual food intake, are an important tool in nutrition research [9]. Although most of these questionnaires range from 80 to 120 questions and take up to 60 min to complete [10], shorter versions have been developed to assess diet quality [11][12][13][14][15][16]. Other short diet assessment tools have been developed to identify foods that contribute the most to the intake of specific nutrients such as saturated fat or sodium [13,17]. Previously published data indicated that a diet quality score derived from such short questionnaires is weakly but significantly correlated with a diet quality score assessed using data from full dietary assessment questionnaires [18]. However, to the best of our knowledge, no Brief Diet Quality Assessment Tool has yet been developed specifically to predict a global diet quality score such as the AHEI.
The objective of this study was to develop and validate a short, simple and cost-effective Diet Quality Assessment Tool in French-speaking adults from the Province of Quebec, in Canada. The classification and regression tree (CART) approach was used for that purpose. We hypothesized that the CART approach yields a predictive model of diet quality that is simple and easy to use, and hence potentially transferable and useful in clinical settings.

Participants
This study is based on data from two main samples of participants, from which subsamples have been created for specific analyses, as detailed below. As shown in Fig. 1, the first sample (development sample) included 1643 healthy participants involved in 11 studies previously conducted at the Institute of Nutrition and Functional Food (INAF) over the years. All data were taken at the baseline of each study, prior to initiating any treatment or intervention, hence reflecting usual habits. The external validation sample comprised 3344 participants taking part in a longitudinal occupational study on cardiovascular health [19]. This external validation sample comprised older individuals, which is relevant to the brief assessment of diet quality since behavioural factors including diet are important predictors of morbidity and mortality in aging populations [20]. Moreover, individuals aged 65 and older are the most frequent users of primary care, therefore a key target population for rapid dietary assessment in such settings [21]. All participants lived in the Province of Quebec at the time of the study and spoke French as their primary language. All participants provided consent in written form to have their data included in a database for use in research other than the main project in which they participated. The protocol of each of these studies was in accordance with the declaration of Helsinki. Data used in this project are part of a data management framework approved by the Laval University Ethics Committee (2008-279 CG A-1 R-2).

Assessment of cardiometabolic risk factors
Each participant visited the INAF or one of the affiliated research centers for at least one in-person data collection session. Height and weight were measured by trained staff. A sub-sample of 940 individuals from the development sample, referred to as the predictive validation sample, provided a 12-h fasting blood sample and had their blood pressure, body composition and waist circumference measured. Blood samples were immediately centrifuged at 17°C for 10 min at 1100×g to obtain serum samples, which were stored at − 80°C until processed. Serum total cholesterol, triglycerides, and HDLcholesterol concentrations were assessed with the use of a Roche Modular P system (Roche Diagnostics, Mannheim, Germany). LDL-cholesterol was calculated using the Friedewald equation [22]. Fasting blood glucose concentrations were measured by colorimetry (Hexokinase Method, Roche Modular P System), whereas insulin concentrations were measured with the use of electrochemiluminescence (Cobas 6000, Roche Diagnostics). Systolic and diastolic blood pressures were determined from the means of 3 consecutive measurements that were taken 3 min apart in a sitting position after a 10-min rest with the use of an automated blood pressure monitor (Digital BPM HEM-907XL model; Omron). Percent body fat was determined by the body composition analyzer BC-418 (Tanita, Arlington Heights, II). Waist circumference measurements were taken at the end of a normal expiration with a tape placed horizontally directly on the skin at mid-distance between the last rib and the top of the iliac crest. Waist circumference was determined as the mean of three measurements at the nearest 0.1 cm.

Dietary assessment
All participants from the development sample and the external validation sample completed the same validated web-based food frequency questionnaire (webFFQ) [23] at home, from which the AHEI was calculated as proposed by Chuive et al. [4]. The scoring method is presented in Table 1. All questions from the webFFQ are structured similarly. Frequency of consumption is first assessed based on up to 8 predetermined answers. Participants then provide information on portion size using up to 6 image options. This sequence is cognitively easier for respondents [24].
Among participants in the development sample, selfreported energy intake (rEI) was estimated using dietary intake data derived from the webFFQ and estimated basal metabolic rate (eBMR) was calculated using the Mifflin-St Jeor equation [25]. We considered, based on the Goldberg cut off [26], that participants with a ratio of rEI:eBMR ranging from 1.2 to 2.4 were plausible reporters. Data from non-plausible reporters based on these criteria in the development sample were excluded from the model development analysis because using potentially invalid data from individuals with over or under-reporting food intake to develop the Brief Diet Quality Assessment Tool may have yielded spurious associations between food intake and diet quality. However, all plausible and non-plausible reporters were included in the external validation sample in order to test the validity of the Brief Diet Quality Assessment Tool in a context that more closely reflects real life conditions, where the risk of over or underreporting is not assessed and therefore unknown.

Development of the brief diet quality assessment tool
The CART approach was used to develop the Brief Diet Quality Assessment Tool in the development sample. CART is a statistical approach of supervised learning that draws food patterns and identifies best predictors of an outcome among a list of variables [4]. This type of algorithm is used to split a sample of independent variables in mutually exclusive subgroups based on common traits [20]. By design, the tool identifies individuals at risk of having a diet of low quality, so that they may receive adequate guidance. By default, all remaining individuals who do not fall into this category have a high probability of having adequate dietary habits. The AHEI was considered the outcome variable, while answers to individual questions in the webFFQ as well as food groups were used as predictors. Overall diet quality was arbitrarily categorized as high (AHEI ≥65/110) or low (AHEI< 65/110) to develop the Brief Diet Quality Assessment Tool. This cut-off was chosen based on the observation that individuals with a score of 65/110 and above are at a lower risk of major chronic disease compared with those with a lower score [4]. Information from the webFFQ was converted into equivalent of servings per day for the analysis using standard references in Canada. Of the 136 questions of the webFFQ, 117 were included in the analysis. As the webFFQ measures food intake with a high degree of specificity for some foods, it was decided to exclude right from the beginning questions that were considered too specific or irrelevant for use in a Brief Diet Quality Assessment Tool. For example, questions from the webFFQ that did not specifically indicate the type of foods consumed (e.g. "How often do you eat other types of bread?") were not considered in the developing the CART. A total of 27 categories were created to generate meaningful food groups based on the categorization proposed in Canada's Food Guide [27] as well as through consensus within the research team. Specifically, a subgroup was created for the different forms of cow milk (low, regular fat), and for all types of milk (including plant-based milks), of yogurt, and of cheese. Subgroups were also created for processed meats (including cold cuts, nuggets, bacon, terrines and sausages) and the different types of fish, breads, cereals, rice, pasta, chocolate and peanut butter were also grouped each in single food categories. Other subgroups were created for processed foods such as muffins, pancakes, pizza, sub sandwiches, cookies, cakes, pies, as well as for soft drinks, tea and coffee, desserts as well as nutritional supplements. Finally, food subgroups reflecting added sugar (in tea or coffee) and added fat were also created. Although most of the questions from the webFFQ are related to specific food items, some refer to a series of foods that have similar nutritional composition (e.g. "How often do you eat broccoli, green and yellow beans, Brussels sprouts, turnips, beets, asparagus, cabbage, mushrooms and mixed vegetables").
Age and sex, which are known to influence diet quality, were also considered as covariates in the models [28,29]. The complete list of variables used to develop the Brief Diet Quality Assessment Tool is available in the Additional file 1. Overfitting was controlled using tenfold Monte Carlo cross-validation [30]. The CART modeling was performed using the statistical program R and the package Rpart with version 3.3.2 (R Foundation for Statistical Computing, Vienna, Austria).

Statistical analysis
In plausible reporters from the development sample, accuracy of the Brief Diet Quality Assessment Tool was assessed by calculating sensitivity (the probability of being classified by the tool as having a diet of low quality among those with an AHEI < 65/110), specificity (the probability of being classified by the tool as having a diet of high quality among those with an AHEI ≥ 65/110), agreement (proportion of respondents adequately categorized by the tool), positive predictive value (PPV; the probability of an AHEI < 65/110 in those classified by the tool as having a diet of low quality), negative predictive value (NPV; the probability of an AHEI ≥ 65/100 in those classified by the tool as having a diet of low quality) and the area under the Receiving Operating Characteristic (ROC) curve.

Predictive validation
In the predictive validation sample, Student's paired t-tests were used to compare the cardiometabolic risk profile of participants classified by the Brief Diet Quality Assessment Tool as having a low or high-quality diet and between true positives (individuals with an AHEI < 65 correctly classified as having a low-quality diet) and false negatives (individuals with an AHEI < 65 incorrectly classified as having a high-quality diet).

External validation
External validation of the Brief Diet Quality Assessment Tool was undertaken using data from the external validation sample and the accuracy metrics described above (ie. sensitivity, specificity, agreement, PPV, NPV and area under the ROC curve) with the AHEI as reference.
XLSTAT 2017 (Addinsoft, Paris, France) was used to assess accuracy of the Brief Diet Quality Assessment Tool while SAS version 9.4 (SAS Institute Inc., NC, USA) was used for all other statistical analyses.

Results
Characteristics of all plausible reporters from the development sample (N = 1040) and all participants from the external validation sample (N = 3344) are presented in Table 2. Participants in the external validation sample were older, had slightly but significantly lower body mass index (BMI, P = 0.01) and higher AHEI score (P < 0.001) compared with those in the development sample. Figure 2 presents the output of the decision tree produced by the CART in the plausible reporters from the development sample. Each split represents the question of the webFFQ that best differentiates the dietary outcome (i.e. diet of low vs. high quality). The cut-offs, expressed in servings per day, are determined by the model itself. The exact same model with the same cutoffs were used for the validation process. Color coded terminal leaves classify the respondents as having a low or a high-quality diet according to the AHEI cut-off of 65/110. Most of the 16 variables included in the final model are directly related to individual components of the AHEI. The first split corresponds to the intake of processed meat, which comprised questions on cold cuts, nuggets, bacon, terrines and sausages. Questions related to the intake of vegetables (broccoli, onion and salad), fruit (apples, which referred to the question: "how often are you eating apples, tangerines, oranges, pears, nectarines or peaches?"), whole grains (whole-grain bread), sugar-sweetened beverages and fruit juice (soft drinks and fruit juice), nuts and legumes (nuts, hummus and peanut butter), long chain (n-3) fatty acids (fish), sodium (French fries and processed meat) are also integral part of the AHEI calculation. Three questions that yielded a decisive split in the CART model were not directly associated with components of the AHEI, namely, 2% M.F. milk, pasta and the grouping of tea and coffee.
Accuracy of the Brief Diet Quality Assessment Tool to identify individuals with a low diet quality (AHEI < 65) in the development sample is presented in Table 3. The Brief Diet Quality Assessment Tool had a high area under the ROC curve (0.92) and PPV (0.90). Other metrics were generally consistent with high accuracy (range 0.84-0.88). Comparative analysis of the cardiometabolic risk profile of individuals with a predicted low and high diet quality in the predictive validation sample is shown in Table 4. As expected, both men and women classified by the Brief Diet Quality Assessment Tool as having a low-quality diet had a significantly lower AHEI score than those classified as having a high-quality diet. Furthermore, individuals classified as having a low-quality diet by the Brief Diet Quality Assessment Tool were younger, had a higher BMI and waist circumference and, globally, a deteriorated cardiometabolic profile compared to those classified as having a high-quality diet. False negatives (individuals with an AHEI < 65 incorrectly classified as having a high-quality diet by the Brief Diet Quality Assessment Tool) had a higher AHEI score and a more favourable cardiometabolic profile than true positives (individuals with an AHEI < 65 correctly classified as having a low-quality diet, Table 5). Finally, Table 6 presents accuracy metrics of the Brief Diet Quality Assessment Tool in the external validation sample. All metrics were significantly lower in the external validation sample than in the development sample (sensitivity, specificity and agreement values of 73.0, 69.0 and 71.3%, respectively).

Discussion
The objective of this study was to develop and validate a short and simple questionnaire to assess diet quality for potential use in a clinical and primary care setting. Using the CART modeling approach, the analysis yielded a Brief Diet Quality Assessment Tool that comprises a maximum of six questions, with acceptable accuracy metrics to identify individuals likely to have a diet of low quality. Predictive validation of the Brief Diet Quality Assessment Tool using cardiometabolic risk factors provided further evidence of adequate performance to identify individuals at risk of having a low diet quality. External validation analyses in a sample of older adults also showed relatively good predictive values, although the model was overall less accurate than in the sample in which it was developed. Therefore, this suggests that Brief Diet Quality Assessment Tool has interesting potential for use in a primary care setting, as it identifies individuals at risk of having a low-quality diet and hence with the greatest needs in terms of nutritional support and guidance.
We are unaware of other studies where brief assessment tools of global diet quality have been developed using detailed dietary assessment methods such as the AHEI as reference. Cook et al. [31] have developed three questionnaires of one and five questions to predict fruit and vegetable consumption. Although more than 80% of high fruit consumers were correctly identified by the single question questionnaire, only 56% of the individual identified as high fruit consumers were true positives. For vegetables, the sensitivity of different options of the model ranged from 36 to 70% and the PPV from 26 to 39%. Similarly, Teal et al. [14] created a brief assessment tool for excessive fat consumption that could reasonably identify high fat consumers (PPV of 81%) but not those with a lower fat intake (NPV of 39%). Dietary assessment tools developed with supervised learning approaches appear to yield higher accuracy metrics. Indeed, using stepwise multiple logistic regression, Glümer et al. [32] developed and validated a screening tool for type 2 diabetes in the Danish population that demonstrated good sensitivity (73%) and specificity (74%). Using the CART approach, Xie et al. [33] developed a diabetes screening tool that was more sensitive and specific in women than in men (61% vs. 59 and 71% vs. 63%, respectively). In the present study, the Brief Diet Quality Assessment Tool presented adequate accuracy metrics. Sensitivity was high with 88% of individuals with AHEI < 65 adequately correctly classified as having a low-quality diet. Specificity, reflecting the capacity of the Brief Diet Quality Assessment Tool to correctly identify individuals not at risk of having a poor diet, was also high at 85%. The area under the ROC curve was 0.92, indicating that the AHEI cut-off of 65/110 was optimal to generate a maximal proportion of true positives over false positives.  A higher AHEI score has been associated with higher HDL-cholesterol concentrations [34] and lower waist circumference [35], blood pressure and triglyceride levels [36] in different populations. Our data are consistent with these observations by showing significant differences in cardiometabolic risk between participant categorized with the Brief Diet Quality Assessment Tool as having a high or a low-quality diet for almost all variables tested. Even if the Brief Diet Quality Assessment Tool did not correctly classify all participants, individuals with an AHEI < 65 who were misclassified as having a high-quality diet (false negative) had a more favorable cardiometabolic risk profile when compared with true positive individuals. Indeed, in addition to presenting a higher AHEI, false negatives had lower BMI, waist circumference, blood pressure, serum TG, insulin and cholesterol/HDL-cholesterol ratio compared with true positives. This observation alleviates the consequences of misclassifying someone with a low diet quality.
In the external validation sample, accuracy metrics of the Brief Diet Quality Assessment Tool were lower than in the development sample. This was anticipated as the CART algorithm was specifically built based on data from the development sample. However, the Brief Diet Quality Assessment Tool performed reasonably well in this independent sample with sensitivity and specificity values of 73 and 69%, respectively. Other investigators have also observed lower metrics of accuracy of the predictive model when testing its external validity [32].
The accuracy metrics yielded by the Brief Diet Quality Assessment Tool needs to be contextualized for its potential use in a clinical primary care setting and according to the consequences of false positive or negative classifications. In a clinical setting, individuals classified as having low-quality diet based on the Brief Diet Quality Assessment Tool may be offered to meet with a dietician, who will inevitably assess dietary habits using more comprehensive dietary assessment methods and confirm their status. False positives, i.e. those presumably at risk of having a low-quality diet but who in fact have an AHEI ≥65, would unnecessarily use dietary counseling resources until their true dietary status is confirmed by more comprehensive assessment methods. Meanwhile, false negatives, i.e. individuals incorrectly classified as having a high-quality diet, will most likely maintain suboptimal dietary habits until further assessment. However, our data indicated that false negative individuals had a less deteriorated cardiometabolic risk profile than true positive individuals. This suggests a higher degree of "tolerance" before actions can be formally implemented to address the issues of diet quality in these patients. Finally, health practitioners need to Variables are presented as mean (standard deviation) AHEI Alternate Healthy Eating Index, BMI Body mass index a P < 0.05, b P < 0.01, c P < 0.001, from the Student's t-test for the difference between participants classified as having a high diet quality or a low diet quality acknowledge that this Brief Diet Quality Assessment Tool is not intended to be a precise dietary assessment tool. The primary function of this new tool is to bring the topic of diet quality to the discussion and potentially awaken consciences of both patients and physicians about this key aspect of preventive medicine.

Strengths and limitations
The use of the CART approach in this study is highly original and can be considered an important strength.
This type of algorithm is used to split a sample of independent variables in mutually exclusive subgroups based on common traits [37]. The end product is visually meaningful and can be easily interpreted by non-statisticians [38]. Other supervised learning methods and regression analysis have been used in the past with slightly better accuracy, but their translation into visually attractive tools is challenging [39,40]. Each CART is also inherently representative of the population in which it was developed. This ensures the generalizability at a local level, which is not guaranteed with tools validated elsewhere [41]. Consequently, this approach has a limited reproducibility in populations other than French-speaking adults of Quebec, in which food habits could be different. The CART approach has also been shown to be prone to classification errors [39]. Such errors apparently did not materially affect the performance of the model predicting diet quality in the development sample. The main advantage of this supervised learning strategy is to maximize specificity while limiting the number of questions by grouping the respondents in subgroups. Indeed, unlike calculating the AHEI using detailed and comprehensive dietary assessment methods, the Brief Diet Quality Assessment Tool can be self-administered within few minutes and interpreted without diet analyzing software. As indicated above, brief dietary assessment tools cannot substitute more comprehensive methods when detailed results are needed for counseling. Finally, even if the final model may be deceptive to some because of the small number of questions it comprises, it is important to highlight that the CART approach identified, from a broad series of foods, those that most closely predict an objective diet quality score, the AHEI, while ignoring other foods that did not statistically contribute to predicting this outcome [42].
There are limitations associated with the use of an FFQ to assess dietary intake, including a certain degree of misreporting [43], despite thorough validation [23]. However, the AHEI has been developed using dietary intake data from FFQs [4]. The Brief Diet Quality Assessment Tool was developed in a sample that excluded individuals with non plausible energy intake. Meanwhile, external validation was undertaken in a sample of individuals that did not exclude non plausible reports. The external validation sample in the present study was composed of older adults than individuals included in the development sample. While this may have attenuated the external validity of the tool to identify those with a low-quality diet, this approach more closely reflects real-life contexts, in which reporting status is unknown when assessing diet quality. We also acknowledge that the development sample may be biased as it includes participants involved in previous nutritionrelated studies who might have a pre-existing interest in

Conclusion
We have developed and validated an easy-to-use Brief Diet Quality Assessment Tool that classifies individuals according to their risk of having a diet of low quality. Individuals classified as having a diet of low quality had a deteriorated cardiometabolic risk profile compared with those classified as having a diet of better quality, a strong predictive validation demonstration. This Brief Diet Quality Assessment Tool could easily be implemented in a primary care setting, where dietary assessment is highly challenging due to limited resources and expertise. Future research includes extensive testing of a web-based version of the Brief Diet Quality Assessment Tool with different health professionals and populations in primary care settings. Testing the external validity of the tool in other populations is also imperative before it can be recommended for use.