Integrative development of a short screening questionnaire of highly processed food consumption (sQ-HPF)

Background Recent lifestyle changes include increased consumption of highly processed foods (HPF), which has been associated with an increased risk of non-communicable diseases (NCDs). However, nutritional information relies on the estimation of HPF consumption from food-frequency questionnaires (FFQ) that are not explicitly developed for this purpose. We aimed to develop a short screening questionnaire of HPF consumption (sQ-HPF) that integrates criteria from the existing food classification systems. Methods Data from 4400 participants (48.1% female and 51.9% male, 64.9 ± 4.9 years) of the Spanish PREDIMED-Plus (“PREvention with MEDiterranean DIet”) trial were used for this analysis. Items from the FFQ were classified according to four main food processing-based classification systems (NOVA, IARC, IFIC and UNC). Participants were classified into tertiles of HPF consumption according to each system. Using binomial logistic regression, food groups associated with agreement in the highest tertile for at least two classification systems were chosen as items for the questionnaire. ROC analysis was used to determine cut-off points for the frequency of consumption of each item, from which a score was calculated. Internal consistency of the questionnaire was assessed through exploratory factor analysis (EFA) and Cronbach’s analysis, and agreement with the four classifications was assessed with weighted kappa coefficients. Results Regression analysis identified 14 food groups (items) associated with high HPF consumption for at least two classification systems. EFA showed that items were representative contributors of a single underlying factor, the “HPF dietary pattern” (factor loadings around 0.2). We constructed a questionnaire asking about the frequency of consumption of those items. The threshold frequency of consumption was selected using ROC analysis. Comparison of the four classification systems and the sQ-HPF showed a fair to high agreement. Significant changes in lifestyle characteristics were detected across tertiles of the sQ-HPF score. Longitudinal changes in HPF consumption were also detected by the sQ-HPF, concordantly with existing classification systems. Conclusions We developed a practical tool to measure HPF consumption, the sQ-HPF. This may be a valuable instrument to study its relationship with NCDs. Trial registration Retrospectively registered at the International Standard Randomized Controlled Trial Registry (ISRCTN89898870) on July 24, 2014. Supplementary Information The online version contains supplementary material available at 10.1186/s12966-021-01240-6.


Background
Changes in eating patterns are occurring worldwide [1]. A common feature of such changes is the transition from minimally processed food to moderately processed to highly processed (HPF) or ultra-processed foods (UPF) [2][3][4][5][6][7][8][9][10]. A widely used, although controversial [11], definition of UPF is that these are "industrial formulations made mostly or entirely from substances derived from foods and additives, with little if any intact food" [12]. According to the NOVA classification system, these foods are highly palatable and habit-forming, microbiologically safe, affordable, strongly marketed and advertised, and sold in convenient and attractive packaging, promoting their overconsumption [12,13]. This, together with the evidence showing their negative impact on health [9,10,[14][15][16][17][18], has turned UPF consumption into a potential public health concern [19] in need of more solid research. While the term UPF is mainly attributed to the NOVA system, other food processing-based classification systems have described foods and drinks of similar characteristics under their categories of HPF [20,21], so we will use the term HPF to refer to this type of foods.
Although numerous studies have demonstrated a link between the risk of developing non-communicable diseases and HPF consumption [12,17,[22][23][24][25][26][27], epidemiological research faces some hurdles in this field. First, the existence of multiple food processing-based classification systems based on different criteria [21,28] results in heterogeneous conclusions regarding health outcomes depending on the system used, as shown recently for cardiometabolic health markers [20]. Second, the lack of an effective tool to assess HPF consumption in clinical studies severely hinders the downstream analysis of its relationship to disease risk. For instance, in the PREDIMED-Plus (from the Spanish "PREvention with MEDiterranean DIet") trial, dietary intake was assessed through a lengthy food frequency questionnaire (FFQ). Therefore, estimation of HPF consumption requires the classification of FFQ items according to the selected classification system, as previously done [20,24,25,29]. This process is time-consuming and subject to bias on the part of the researcher classifying the items since FFQs are not specifically designed to include HPF. This is because an FFQ is designed to estimate consumption of general commonly consumed foods, of which some may fall into the HPF category, but this depends on the classification system used [20]. In addition, calculations of daily consumption of each HPF item and percentage over total intake (in grams or kcal per day) are commonly used to infer their association with health outcomes [9,17,30]. These calculations are not direct and time-consuming when derived from current tools such as FFQs or 24 h recalls. There is, therefore, the need for an easy-touse and comprehensive measure that assesses HPF consumption in the general population. This paper describes the development of a new screening questionnaire that allows an easy and quick determination of a subject's HPF consumption, the sQ-HPF. We view this as an integrative tool since it incorporates criteria from four foodprocessing-based classification systems. We developed this questionnaire based on available data from an FFQ to create a tool that could replace the estimation of HPF consumption from FFQ in future studies. We hypothesize that the sQ-HPF is comparable in terms of evaluating HPF consumption to other dietary assessment tools that must be used in combination with food processing-based classification systems and can effectively capture longitudinal changes in HPF consumption.
This questionnaire will be of potential interest to the scientific community, especially in the context of clinical nutrition and public health. Its use should minimize the difficulty in comparing results from studies that use different classification systems, allowing a straightforward evaluation of the HPF dietary pattern in large epidemiological studies efficiently and comparably. Altogether, it will enable the development of tailored nutritional interventions and food policies to limit HPF consumption and prevent diet-related diseases.

Study population
Data from the PREDIMED-Plus trial (from the Spanish "PREvention with MEDiterranean DIet") was used. This is an ongoing 6-year, multicenter, randomized, parallelgroup clinical trial launched in Spain in 2013. The main aim of the study is to evaluate the effect on primary cardiovascular disease prevention of an intensive weight loss and its long-term maintenance through a lifestyle intervention based on three pillars: energy-restricted Mediterranean diet (er-MedDiet), increased physical activity (PA) and behavioral support. The study protocol, including study design and data collection, can be found at the PREDIMED-Plus website (https:// www. predi medpl us. com/ en/) and was approved according to the ethical standards of the Declaration of Helsinki by the Institutional Review Boards (IRBs) of all participating centers. All participants provided written consent for their participation in the study. The trial is conducted in 23 Spanish centers and involves 6874 participants (48.5% female, 54.5% male) between 55 and 75 years old (mean age and SD 65.0 ± 4.9) who presented with overweight or obesity (BMI ≥ 27 and < 40 kg/m 2 ) and met at least three criteria for metabolic syndrome (MetS) as previously described [31]. Details about the cohort have been described elsewhere [32]. The trial was retrospectively registered at the International Standard Randomized Controlled Trial Registry with number 89898870 on 24th July 2014. The present analysis used baseline and longitudinal (years 1 and 2) data from the PREDIMED-Plus study data set dated 26th June 2020 was used. Participants with implausible energy intakes (< 500 or > 3500 kcal for females and < 800 or > 4000 kcal for males) were excluded. In addition, participants with missing values for dietary, lifestyle and socioeconomic variables were not included in the analysis. After the definition of the tertile agreement variable (see Statistical analyses section), participants with (1) no coincidence in extreme tertiles of HPF consumption (either in tertile 1 or tertile 3) according to at least two classification systems and (2) participants that were classified in tertile 1 for two classification systems and tertile 3 for the other two classification systems were excluded. This was done to select extreme HPF consumers to further construct the binomial regression model. The final number of participants included in the analysis for the development of the questionnaire was 4400 (48.1% female and 51.9% male, 64.9 ± 4.9 years) (Fig. 1).
This study adhered to the STROBE-nut reporting guidelines [33].

Blood measurements
Trained nurses collected blood samples after overnight fasting at the recruiting centers or primary health care centers. Plasma glucose, triglycerides and total cholesterol levels were measured following standard enzymatic methods.

Anthropometric measurements
Weight and waist circumference measurements were taken from participants in light clothing with no shoes or accessories, using an electronic calibrated scale and an anthropometric tape, respectively. Waist circumference was measured midway between the lowest rib and the iliac crest. Height measurements were taken using a wallmounted stadiometer. Body Mass Index (BMI) was calculated as the weight in kilograms divided by the square of height in meters.

Lifestyle measurements
Socioeconomic and PA data were collected through the general PREDIMED-Plus questionnaire. Based on the validated questionnaires, the REGICOR [34] and the Rapid Assessment of Physical Activity (RAPA) [35], participants were asked about the frequency and intensity of physical activities, and three levels of PA were defined as follows: low (frequent sitting and little walking and/or frequent sitting and moderate sustained efforts), medium (frequent walking with no vigorous efforts), high (frequent walking and vigorous efforts and/or frequent vigorous efforts). Sedentary behaviors were assessed through a validated questionnaire, the Spanish version of the Nurses' Health Study (NHS) questionnaire [36]. Data about eating habits (binge eating, snacking) were collected within the multidimensional scale of weight locus control questionnaire [37]. Trained interviewers administered questionnaires in individual face-to-face sessions.

Dietary measurements
To assess the dietary intake of participants over the last year, a validated semi-quantitative 143-item FFQ [38][39][40] that considers variations in dietary patterns among seasons, weekdays and weekends was used. Participants were asked the average frequency of consumption of a commonly used portion size (e.g., glass, cup, slice) for each food or beverage item. Nine options for frequency of consumption are given, ranging from "never or hardly ever" to "more than six times a day. " To estimate the daily consumption for each item, the portion size was multiplied by the frequency of consumption and then expressed as grams per day. This calculation was not possible for the fried foods item as the portion size is not specified in the FFQ. To assess adherence to an er-MedDiet, a 17-item questionnaire specially developed and validated for the PREDIMED-Plus trial [41,42] was used. Through this questionnaire, the frequency of consumption of traditional Mediterranean food items is evaluated. One point is scored when the answer meets specific criteria defining er-MedDiet, so the higher the score, the better adherence to this diet.

HPF consumption and food processing-based classification systems
Four food-processing based classification systemsthe NOVA [12,43,44], the International Agency for Research on Cancer (IARC) [7,45], the International Food Information Council Foundation (IFIC) [46,47], and the University of North Carolina (UNC) [48] systems -were used to classify FFQ items into processing categories as previously described [20]. In the present study, HPF refers to the following groups: Group 4 for NOVA, Group 3 for IARC, Groups 4 and 5 for IFIC, and Groups 4.1 and 4.2 for UNC. For each participant, HPF consumption was estimated according to each classification system as the sum of grams per day consumed from foods in the HPF group, divided by the total grams of food consumed per day and multiplied by 100 [20]. The frequency of consumption was directly obtained from FFQ answers in times per day.

Statistical analyses
Data analysis was conducted using R programming language [49] in RStudio [50] and with the following statistical packages: "DescTools" [51], "psych" [52], "tableone" [53], "cutpointr" [54], "corrplot" [55], "vcd" [56] and rstatisx [57]. Participants were classified according to tertiles of HPF consumption for each classification system (T1 -low HPF consumption, T3 high HPF consumption). Tertiles were chosen to capture and show the variability in HPF consumption while allowing a straightforward comparison between classification systems. Then, subjects were classified according to tertile agreement for, at least, two classification systems. Those classified in T3 in at least two classification systems scored "1" for tertile agreement, while those classified in T1 for at least two classification systems scored "0". FFQ items were classified into food groups according to their similar nature, nutritional characteristics, and/or form of consumption (Supplementary Table 1). The association between tertile agreement and frequency of consumption for each food group was analyzed through binomial logistic regression, adjusted for age, sex, recruitment center, energy intake, physical activity level, medication for blood pressure and diabetes, working status, educational level, and civil status, using the "glm" function from R base package "stats. " These covariates were selected due to their potential effect on HPF consumption. To calculate optimal cut-off points of frequency of consumption for the selected items, receiver operating characteristic (ROC) analysis was performed using the R package "cutpointr" and the "cutpointr" function [54]. The method to estimate the cut-off points was based on the maximization of the Youden-Index [58]. Estimated cut-off points were used to establish the criteria for scoring 1 point in the sQ-HPF, which indicated high HPF consumption, or 0, indicating low HPF consumption. Cut-off points were adapted to the nine possible answers of the FFQ, so the criteria for scoring were based on a threshold frequency of consumption for each item (i.e., food group). In parallel, exploratory factor analysis (EFA) was performed to identify underlying relationship patterns between items included in the sQ-HPF using the "fa" function from the "psych" package [52]. To test for data suitability for the EFA, the Kaiser-Meyer-Olkin Criterion [59] and Bartlett's test of sphericity [60] were applied. The EFA was performed without rotation and with a principal factor solution as a factoring method. Factor retention was based on the scree plot and Kaiser's criterion [61]. Cronbach's alpha [62] was calculated as a measure of internal consistency of the questionnaire. Using the criteria for scoring, the sQ-HPF score was calculated for each subject in the PREDIMED-Plus database, and the questionnaire estimated HPF consumption was calculated through linear regression analysis using the sQ-HPF score as the dependent variable. For descriptive analyses, participants were classified into tertiles of the sQ-HPF score. Data is shown in tables as "mean (standard deviation, SD)" for continuous variables and as "number of subjects (%)" for categorical variables. Statistically significant differences (p < 0.05) in dietetic and lifestyle variables among tertiles were compared using a one-way ANOVA test for continuous variables and a Chi-squared test for categorical variables. P-values were adjusted for age, sex, recruitment center, energy intake, physical activity level, medication for blood pressure and diabetes, working status, educational level, and civil status. To assess the concordance between tertiles of HPF consumption calculated by the four classification systems and by the sQ-HPF, weighted Cohen's kappa (κ) coefficients were calculated with the function "Kappa" from the R package "vcd". For longitudinal analysis of HPF consumption, a linear mixed model was performed using the R package "lme4" and "emmeans" with the same covariates as previous analyses. The number of subjects used for this analysis was 3284 due to longitudinal data loss.

General characteristics of the PREDIMED-Plus cohort according to tertile agreement
Subjects scored 1 in the tertile agreement variable if they were classified in the highest tertile (T3) of HPF consumption for at least two classifications systems, while they scored 0 if they were classified in the lowest tertile (T1) of HPF consumption for at least two classifications systems. General characteristics of PREDIMED-Plus participants at baseline according to the scores of the tertile agreement variable are shown in Table 1. Subjects who scored 1 (high HPF consumption by at least two classification systems) were mainly men (73.1%), had higher energy intake (2559.84 kcal/day) and lower MedDiet adherence (7.65 points) than subjects who scored 0 (low HPF consumption by at least two classifications systems). Among the high HPF subjects, 78.8% were married and 41.8% had a primary education level. Around half of the subjects who scored 1 showed a low level of PA (55.3%) and were taking medication for cholesterol (49.8%).
In addition, 75.7% of them were taking medication for blood pressure and 25.9% were on diabetes medication. According to these results, the variables age, sex, recruitment center, energy intake, physical activity level, medication for blood pressure and diabetes, working status, educational level, and civil status were selected as covariates for further analysis due to their potential effect on HPF consumption. MedDiet adherence was not selected as a covariate due to the presence of collinearity with HPF consumption.

Development of the sQ-HPF
Food groups were defined based on the similarity in nature, nutritional profile and/or form of consumption of the PREDIMED-Plus FFQ food and beverage items (Supplementary Table 1). Food groups chosen to be included in the sQ-HPF showed a significant positive association (p-value adjusted by Bonferroni < 0.05) between their frequency of consumption and a value of 1 of tertile agreement, i.e., the subject is in tertile 3 of HPF consumption for at least two food processing-based classification systems (Table 2). A final solution of 14 food groups was selected and included the following: fatty dairy products, sugary dairy products, cured meat, fats, fermented alcohol, distilled alcohol, sugary and artificially sweetened drinks, sweets, snacks, ready to eat products, refined cereals, sauces, additives, and fried foods.
EFA revealed that most of the items selected for the sQ-HPF had factor loadings higher than 0.2, indicating that they were representative contributors to the factor ( Table 3). The measure of sample adequacy (MSA) was 0.78, considered as "good" for the verification of the proportion of variance in variables that can be caused by factors, according to the Kaiser-Meyer-Olkin Criterion. Bartlett's test of sphericity was highly significant (p < 0.001), indicating that variables were correlated in the population. This, together with the MSA value, indicated the adequacy of the data to proceed with EFA. Factor retention applying the Kaiser criterion revealed a single underlying factor being identified by the questionnaire items, namely the HPF dietary pattern. The internal consistency of the questionnaire items was evaluated with Cronbach's alpha, which had a moderate value of 0.67. Table 1 General characteristics of PREDIMED-Plus participants at baseline according to tertile agreement Data shown as "mean (standard deviation, SD)" for continuous variables and as "number of subjects (%)" for categorical variables. One-way ANOVA test used for continuous variables and Chi-squared test used for categorical variables. Significant p-values (< 0.05) shown in bold MedDiet Mediterranean diet, PA Physical activity a Tertile agreement variable: score 0 -"low HPF consumer" if a subject is classified in T1 of HPF consumption by at least two classification systems; score 1 -"high HPF consumer" if a subject is classified in T3 by at least two classification systems. HPF: highly processed food. N = 4400 The sQ-HPF was based on the 14 food groups selected previously. Each item asked about the frequency of consumption of a particular food group ( Table 4). Examples of representative food and beverage items included in each food group were provided for each item. Estimated cut-off points were used to determine the threshold frequency of consumption to consider the respondent as an HPF consumer for the item, as shown in the column "Criteria for 1 point" in Table 4. For the self-reported version of the questionnaire, this column should be removed from the questionnaire form since it is intended for the use of the person assessing the score only. Using baseline data from the PREDIMED-Plus FFQ, the percentage of HPF over total grams per day according to the questionnaire items was calculated for each patient. This was used as the dependent variable in a linear regression with the sQ-HPF score obtained for each patient to establish the following equation for the regression line: "HPF consumption (% g/ day) = (3.7 x sQ-HPF score) + 7.6". This equation allows the estimation of the HPF consumption from the sQ-HPF score, as shown at the bottom of Table 4.
Weighted κ coefficients were calculated between HPF consumption tertiles according to the sQ-HPF and the four existing classification systems (Supplementary Table 2). The highest agreement was for the comparison with UNC tertiles (κ = 0.88), followed by IFIC tertiles (κ = 0.65) and IARC tertiles (κ = 0.50). The comparison of tertiles according to the sQ-HPF with NOVA tertiles of HPF showed a fair agreement (κ = 0.36). These comparisons were in accordance with the corresponding agreement plots (Fig. 2).

Dietary, lifestyle and cardiometabolic characteristics of PREDIMED-Plus participants according to the sQ-HPF score
We aimed to investigate whether changes in HPF consumption as estimated from the FFQ and the four different systems were also detected when the sQ-HPF was Table 2 Associations between candidate sQ-HPF items and tertile agreement variable by binomial logistic regression Binomial logistic regression adjusted for age, sex, recruitment center, energy intake, physical activity level, medication for blood pressure and diabetes, working status, educational level, and civil status. Tertile agreement is the outcome variable (score 0 -"low HPF consumer" if a subject is classified in T1 of HPF consumption by at least two classification systems; score 1 -"high HPF consumer" if a subject is classified in T3 by at least two classification systems). Food groups expressed in frequency of consumption (times/day). HPF: highly processed food.  used. In addition, we wanted to analyze the dietary profile across tertiles of the sQ-HPF score, since this is a tool to evaluate a particular dietary pattern. Dietary characteristics of PREDIMED-Plus participants at baseline are shown in Table 5, grouped by HPF consumption tertiles according to the score obtained in the sQ-HPF (T1lowest score, T3 -highest score). Consumption of food groups included in the questionnaire was the highest for subjects in T3 of the sQ-HPF score, which was not the case for vegetables, fruits, and legumes. Associations between lifestyle and cardiometabolic parameters and HPF consumption have been previously reported, so we next aimed to analyze if we could detect changes in these variables across tertiles of the sQ-HPF score. Lifestyle and cardiometabolic characteristics of PREDIMED-Plus participants at baseline grouped by HPF consumption tertiles showed differences according to the sQ-HPF score ( Table 6). Participants ranked in the highest tertile had higher levels of triglycerides (161.05 ± 90.55 mg/dL), higher weight (90.68 ± 13.10 kg) and waist circumference (109.83 ± 9.45 cm) compared to those in the lowest tertile. No significant changes among tertiles were detected in fasting glucose and total cholesterol levels. Subjects in T3 of the sQ-HPF score spent more time watching TV than T1 subjects (4.05 ± 2.02 h/ day), while sleeping hours were similar across tertiles. Around half of the subjects classified in T3 were classified as sedentary (51.5%). Concerning eating habits, 30.5% of the subjects with the highest sQ-HPF score reported snacking.

Assessing longitudinal changes in HPF consumption with the sQ-HPF
Longitudinal analysis of HPF consumption estimated from the FFQ by each classification system revealed significant differences across the first 3 years of the PRED-IMED-Plus study. In all cases, mean HPF consumption showed a trend towards a decrease that was lower in year 2 than baseline (Table 7). This was also the case when HPF consumption was estimated through the sQ-HPF (22.9 ± 0.22% of g/day at baseline and 17.3 ± 0.22% of g/ day in year 2).

Discussion
This article presents the sQ-HPF, a short, integrative, and easy-to-use questionnaire to estimate HPF consumption ( Table 4). The development process involved a carefully designed statistical analysis and practical considerations for its use in a clinical/epidemiological context. This new tool includes 14 food and beverage items of which the frequency of consumption is recorded, based on a previously validated FFQ from the PREDIMED-Plus Trial [38][39][40]. Each item is scored as 1 if the frequency of consumption corresponds to the HPF dietary pattern, according to the calculated thresholds, and as 0 otherwise. Therefore, the higher the score, the higher the consumption of HPF. Moreover, this score can be used to estimate the percentage of HPF consumption over total intake, avoiding the need to administer a lengthy FFQ (Table 4) [63]. The present work demonstrates that statistical approaches such as EFA and Cronbach's analysis are valuable tools for developing integrative tools related to dietary patterns and eating habits. Indeed, analysis of questionnaire items through EFA demonstrated that they identified one core construct, the HPF dietary pattern (Table 3) [64]. This was expected considering that items were selected based on a positive association with the variable tertile agreement through binomial logistic regression (Table 1). This resulted in 14 questions based on the frequency of consumption of food groups associated with a higher HPF consumption. One of the reasons for this was that, in this way, the questionnaire could identify people with an HPF dietary pattern with a focus on the frequency of consumption and not on the specific HPF items they consumed, so that the questionnaire could detect different HPF dietary patterns. Another reason for the selection of items is that the questionnaire can also provide information on how to intervene to improve health in other populations by identifying the food groups most frequently consumed by the respondent and allowing for customized targeting of the dietary intervention to reduce consumption of those food groups. Finally, this selection was performed to simplify the use and interpretation of the questionnaire. In this way, the questionnaire rapidly results in a score that increases with HPF consumption and can be used to categorize the respondent accordingly. Notably, the questionnaire was designed to encompass criteria from four leading food processing-based classification systems -NOVA, IARC, IFIC, and UNC -, so the categorization as HPF consumer occurs according to at least two of these classification systems (tertile agreement) [20]. This is a key asset of the sQ-HPF, which makes it more comprehensive than other methods that have been proposed and are limited to one classification system [63]. In this regard, the limitations of the NOVA system, the most used classification system, have been widely discussed [65] and it has been shown that the choice of the classification system can have a significant impact on research outcomes [20]. Therefore, the integrative approach used to create the sQ-HPF is undoubtedly a critical advantage contributing towards an objective measurement of HPF consumption and its association to disease. In addition, a reason for choosing food groups rather than specific food Table 5 Dietetic characteristics of PREDIMED-Plus participants at baseline according to tertiles of the sQ-HPF score Data shown as "mean (standard deviation, SD)". One-way ANOVA test used for continuous variables. Significant p-values (< 0.05) shown in bold. HPF consumption was estimated as the percentage over total grams per day, except for fried foods (data not available) 1 P-values adjusted for age, sex, recruitment center, energy intake, physical activity level, medication for blood pressure and diabetes, working status, educational level, and civil status a Excluding potatoes. MedDiet: Mediterranean Diet. N = 4400 items was that this would allow assigning specific food items to groups in the questionnaire -even if they were not initially considered because they were not present in the FFQ -attending to similar characteristics than one of the 14 food groups. Another significant advantage of the sQ-HPF is its quick and easy administration and interpretation, regardless of the method of administration (i.e., selfreported or interviewer-led). This is particularly useful at the epidemiological level, where a high volume of responses is often collected and the length and subsequent processing of questionnaires is an important aspect of study design [66]. Despite its apparent simplicity, the ability of the sQ-HPF to classify subjects according to their HPF consumption is comparable to other more involved classification systems (Supplementary Table 2). As a reflection of this, longitudinal changes in HPF consumption estimated by the four classification systems were also detected with the same significance by the sQ-HPF estimation (Table 7). Furthermore, significant differences among participants grouped according to sQ-HPF scores could be detected in weight, waist circumference, and hours spent watching TV, among others (Table 6). These results indicate the potential of the sQ-HPF as an initial tool to evaluate the impact of HPF consumption on lifestyle in large-scale intervention trials, such as PREDIMED-Plus. For these reasons, we firmly believe that the consistent use of the sQ-HPF in epidemiological studies could contribute to a robust understanding of the relationship between HPF dietary patterns and health.
The applicability of the sQ-HPF spans not only epidemiological research but also public health and clinical nutrition. The use of "a priori" screening tools to define Table 6 Lifestyle and cardiometabolic characteristics of PREDIMED-Plus participants at baseline according to tertiles of the sQ-HPF score Data shown as "mean (standard deviation, SD)" for continuous variables and as "number of subjects (%)" for categorical variables. One-way ANOVA test used for continuous variables and Chi-squared test used for categorical variables. Significant p-values (< 0.05) shown in bold BMI Body Mass Index, PA Physical activity 1 For continuous variables, p-values are adjusted for age, sex, recruitment center, energy intake, physical activity level, medication for blood pressure and diabetes, working status, educational level, and civil status a Number of binges per week is calculated for those who answered "yes" to the question "binge eating" (overall n = 313, n by tertiles: T1 = 114, T2 = 80, T3 = 119). N = 4400  dietary habits and patterns plays an important role in developing personalized nutrition advice [67]. Several epidemiological studies have shown the importance of using dietary patterns to assess the association between dietary exposure and risk of developing chronic diseases [14], giving rise to valuable screening tools aimed at large populations [68]. Most of the developed scores and indices emphasize the importance of healthy dietary habits -Mediterranean diet [41,69], Nordic diet [70], "healthy" diet [71,72] -based on a theoretical rationale. However, some studies have reflected the need to capture deleterious eating habits that present a counteractive distortion over the beneficial effects of these diets [67]. In this regard, some approximations focused on the assessment of positive health effects of the diet have included potentially harmful items in their composition, such as the Healthy Eating Index (HEI) [73,74] and the Alternate HEI (AEHI) [75] to encapsulate the global adequacy to dietary guidelines. The sQ-HPF presented here could thus be a complementary tool that counteracts the analysis from questionnaires aimed at healthy dietary effects.
In addition, its simplicity makes it easy to be implemented in the clinical practice, for example, at the level of primary care, for disease-preventing purposes. Furthermore, the sQ-HPF integrates the divergent UPF and HPF definitions through an agglomerative statistical analysis of commonly used criteria, which provides additional consistency to its application. Regarding public health applications, the limitation of the so-called "discretionary foods" consumption has been gradually incorporated into health policies [76][77][78]. These foods include confectionery, soft drinks, biscuits, snacks, cakes and pastries, among others [76], most of which are considered HPF [30]. In keeping with this, we propose the use of the sQ-HPF as a screening tool to identify trends in consumption of these types of foods in large populations [79], to consistently identify foods that can be targeted in public health campaigns [19]. This will also contribute to harmonizing the methods of assessing government policies addressing the healthiness of food environments, a well-exposed need for the worldwide prevention of diet-related diseases [11,19,80].
One of the main limitations of the ad hoc development of the sQ-HPF is that its reproducibility in other populations is necessary. The PREDIMED-Plus cohort comprises older adults with metabolic syndrome and overweight or obesity and a limited HPF consumption, so the reproducibility of the sQ-HPF in populations with different characteristics will require further validation. Another limitation is that the FFQ used for the initial development of the questionnaire is not specifically designed to measure HPF consumption. However, it contains a considerable number of items categorized as HPF and UPF by current food processing-based classifications [20]. These include industrial biscuits, milkshakes, bakery products, cured meats, RTE preparations, snacks, and soft drinks. Therefore, the sQ-HPF was constructed so that specific HPF are very likely to fall into one of the groups defined in the questionnaire due to their similar description and characteristics to the items already specified in the questionnaire. The use of an FFQ has other limitations, such as recall bias that can impact the accuracy of the data collected [40], but its use to evaluate dietary intake in large epidemiological studies is widely recognized as suitable [39]. We aimed to develop a screener for HPF consumption, so it could be used as a tool that allowed rapid screening of a large population and the identification of consumers with a high HPF dietary profile. For an in-depth study of the specific items that make up the dietary pattern of each consumer, it will still be necessary to use more detailed FFQs or dietary recall interviews. Different research teams have developed FFQs aimed to evaluate dietary intake from HPFs [81,82], so it would be of interest to compare these to the sQ-HPF through intra-class correlation coefficient analyses. In the case of a good correlation, we think that the sQ-HPF would present the advantage of being shorter and more direct regarding the estimation of HPF consumption, which will facilitate the work in epidemiological research and clinical practice.

Conclusions
The sQ-HPF is an integrative, short, and easy-to-use questionnaire that can be used to screen HPF consumption in large populations for epidemiological or public health purposes. Due to the straightforward score calculation and interpretation, it could aid Personalized Nutrition practice and food policy-makers in the implementation of effective strategies to tackle diet-related disorders associated with HPF consumption.

Funding
The following funding bodies contributed to study concept and design, research, and data collection for the PREDIMED-Plus study. The PREDIMED-

Availability of data and materials
There are restrictions on data availability for the PREDIMED-Plus trial due to the signed consent agreements around data sharing, which only allow access to external researchers for studies following the project purposes. Requestors wishing to access the PREDIMED-Plus trial data used in this study can make a request to the PREDIMED-Plus trial Steering Committee chair: jordi. salas@ urv. cat. The request will then be passed to members of the PREDIMED-Plus Steering Committee for deliberation.