A meta-analysis of the reproducibility of food frequency questionnaires in nutritional epidemiological studies

Background Reproducibility of FFQs measures the consistency of the same subject at different time points. We performed a meta-analysis to explore the reproducibility of FFQs and factors related to reproducibility of FFQs. Methods and findings A systematic literature review was performed before July 2020 using PubMed and Web of Science databases. Pooled intraclass and Spearman correlation coefficients (95% confidence interval) were calculated to assess the reproducibility of FFQs. Subgroup analyses based on characteristics of study populations, FFQs, or study design were performed to investigate factors related to the reproducibility of FFQs. A total of 123 studies comprising 20,542 participants were eligible for the meta-analysis. The pooled crude intraclass correlation coefficients ranged from 0.499 to 0.803 and 0.499 to 0.723 for macronutrients and micronutrients, respectively. Energy-adjusted intraclass correlation coefficients ranged from 0.420 to 0.803 and 0.507 to 0.712 for macronutrients and micronutrients, respectively. The pooled crude and energy-adjusted Spearman correlation coefficients ranged from 0.548 to 0.851 and 0.441 to 0.793, respectively, for macronutrients; and from 0.573 to 0.828 and 0.510 to 0.744, respectively, for micronutrients. FFQs with more food items, 12 months as dietary recall interval (compared to less than 12 months), and a shorter time period between repeated FFQs resulted in superior FFQ reproducibility. Conclusions In conclusion, FFQs with correlation coefficients greater than 0.5 for most nutrients may be considered a reliable tool to measure dietary intake. To develop FFQs with higher reproducibility, the number of food items and dietary recall interval should be taken into consideration. Supplementary Information The online version contains supplementary material available at 10.1186/s12966-020-01078-4.


Introduction
The FFQ is the most commonly used tool to assess individual usual dietary intake in nutritional epidemiological studies, especially for investigating the relationship between dietary and health outcomes [1,2]. FFQs allow researchers to rank subjects according to their dietary and nutritional intake. Obtaining an accurate estimate of long-term habitual food intake is crucial [3], which is very important to better understand diet and associated diseases. However, assessment of nutritional habits is complex [4], and they are affected by real changes in regular dietary intake and random changes in FFQ [5,6]. FFQs allow covering a wider range of foods, including those consumed rarely, and can be administered once whereas to describe usual dietary habits with a reasonable reproducibility [7]. If the reproducibility is not maintained high enough, the dietary intakes of subjects measured at baseline would substantially misclassify their true exposure during the study period [8]. To enhance the interpretation of estimated diet-disease associations and to improve the translation of such associations into dietary recommendations, reproducibility analysis is required before applying FFQ to analyze dietary intake [9].
Reproducibility reflects reliability and refers to the similarity of the same method at different timepoints [10]. Reproducibility is generally assessed by administering the same FFQ twice to the same group of subjects and analyzing the association between the two responses [11]. Previous studies reported that the intervals between two FFQs varied from 1 week [12] to 2 years [13]. And true change in regular dietary intakes and random variation in response to the FFQ have been considered factors affecting the repeatability of FFQs [14], which result in reduced reproducibility of FFQs with long interval [2,15]. However, the two FFQs administered closely, respondents may remember and repeat their previous responses and result in high reproducibility [2].
Numerous studies have been devoted to assess the reproducibility of FFQs before applying FFQ to different populations. The Spearman and intraclass correlation coefficients (ICCs) to assess the reproducibility of 134item FFQs with approximately 6 months apart ranged from 0.46 to 0.79 and from 0.34 to 0.71, respectively, for 25 nutrients in the Shanghai Diet and Health Study [16]. The reproducibility of another FFQ of 157 items with 3month interval used in the Food4Me study (a randomized controlled trial across seven European countries) has been reported to range from 0.62 to 0.89 [17]. Then, a repeatability study of an interview administered FFQ of 135 items in the Mexican Women's Bone Health Cohort Study found that the reproducibility coefficients range from 0.186 to 0.810 for energy-unadjusted data and 0.174 to 0.597 for energy-adjusted data [18]. However, the correlation coefficients of different nutrients evaluated in different studies are different, and a widely accepted reference value for the reproducibility of FFQs is currently lacking.
Furthermore, the characteristics of FFQs may affect their reproducibility. A previous study reported that the ICCs of an FFQ comprising 255 items ranged from 0.69 (fat) to 0.84 (vitamin A) in Moroccan adults [19]. A shorter FFQ assessing the average consumption of 57 food items was reported to have a reliability coefficient ranging from 0.56 to 0.70 [20]. Therefore, FFQ items may induce differences in reproducibility. A previous study suggested that the median (range) energy-adjusted Spearman correlation coefficients (SCCs) for 30 nutrients between two FFQ measurements was 0.24 (0.04-0.69) for men and 0.50 (0.27-0.60) for women [21], suggesting that the reliability of FFQs differ between men and women. Moreover, differences in FFQ reproducibility may be caused by other factors [22], such as real changes in diet over time, individual differences in diet, and study design differences [22,23]. However, there has been a paucity of studies comprehensively exploring the effects of these factors on the reproducibility of FFQs.
Although the reproducibility of FFQs has been evaluated in various studies, there has yet to be a comprehensive meta-analysis of the reproducibility studies and definition of reference ranges for reproducibility coefficients. Moreover, no study has systematically explored the factors related to the reproducibility of FFQs. Therefore, we conducted a meta-analysis to systematically assess the reproducibility of FFQs and to explore the factors related to the reproducibility of FFQs.

Methods
A systematic review was conducted according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guideline; the relevant checklist is provided in PRISMA Checklist.

Literature search
We conducted a comprehensive literature search for published studies from PubMed and Web of Science databases before July 2020. The literature search was conducted by two independent researchers. The search strategy used employed the terms "FFQ OR food frequency questionnaire" AND "reproducibility OR repeatability OR reliability".

Study identification and selection
The potentially relevant articles were evaluated by two independent reviewers based on the inclusion. The original studies were obtained from the database. After removing duplicates, we screened the studies according to title and abstract. After reading the full texts, the eligible articles were obtained by exclusion criteria.
Articles were included if they met the following criteria: (1) FFQs were used to measure nutrient intake; (2) the age range of target healthy populations was between 8 and 86 years; (3) the study assessed the reproducibility of FFQs; (4) the study was published in English; and (5) the reproducibility of FFQs was measured with the intraclass correlation coefficient (ICC) and Pearson correlation coefficient or SCC.
The exclusion criteria were: (1) food intake was assessed using FFQs; (2) FFQs were used to assess a specific nutrient; (3) the target population was unhealthy people or specific populations, such as individuals who were overweight or malnourished; (4) the participants were less than 8 years old; (5) the article investigated diet-disease relationships; and (6) the full text was unavailable through web searches.

Meta-analysis
The pooled correlation coefficients were calculated based on the ICC and SCC values obtained from each article. We converted Pearson correlation coefficients into SCCs if the latter were lacking. Fisher's transformation was used to convert each correlation coefficient to an approximately normally distributed z-value. The standard error of z was calculated. After appropriate conversion, random effects meta-analyses were used to combine data. The heterogeneity of the z-values among studies was determined by calculating the inconsistency index (I 2 ). I 2 greater than 50% indicated the presence of heterogeneity. z-values were converted using inverse Fisher's transformation to obtain correlation coefficients and 95% CIs to account for results. Sensitivity analysis was performed to explore the when to further explore the source of heterogeneity.
Studies were stratified according to the following characteristics: (1) population characteristics including age (< 18 years, 18-50 years, and > 50 years), gender, and region; (2) characteristics of the reproducibility studies including sample size (≤ 112 and > 112 , the cutoff point was the median of sample size) and time interval between repeated FFQs (≤6 months and > 6 months, the cutoff point was the median of time interval); and (3) characteristics of FFQ design including FFQ items (≤ 120 and > 120, the cutoff point was the median of item), dietary recall interval (≥12 months and < 12 months), administration mode (interviewer-administered or selfadministered). All statistical analyses were performed using Stata Software (Version 11.0 Stata, College Station, TX, USA). A P-value less than 0.05 was considered statistically significant.

Literature search and study selection
The flow chart of the study selection is shown in Fig. 1.
We identified 2706 original studies from the database. After removing 1256 duplicates, 159 articles met the inclusion criteria according to title and abstract screening. After reading the full texts, 35 articles were excluded according to exclusion criteria. In total, we obtained 123 articles based on the procedure described above.

Study characteristics
An overview of the retrieved studies assessing the reproducibility of FFQs is presented in Table 1 (detail information shown in Supplemental Table 1). Of the 123 articles included [4, 10-13, 15-18, 20, 21, 23-134], two articles analyzed differences in different age groups [50,113], and five articles assessed the differences in reproducibility according to time intervals between repeated FFQs [39,91,96,113,134]. The extracted information on characteristics of the included studies is summarized in Table 1 (detail information shown in Supplemental Table 1). The median sample size per study was 112 (range: , with a total of 20,542 participants. The age range of participants was between 8 and 86 years. The studies were divided into three groups according to age: adult (18-50 years), elderly (> 50 years) and adolescent (< 18 years); these comprised 77, 33, and 15 studies, respectively. For studies with a wide participant age range covering cutoff point, the mean age reported in articles was used as the grouping criterion first. In addition, the median age was used to group population if the mean age was not available. For FFQ characteristics, the median number of FFQ items was 120. The number of studies that required participants to recall food intake for more or less than 12 months was 80 and 33, respectively. Of these studies, 44 were interview-administered, 63 were self-administered, and 16 were not available. Time intervals between FFQs varied considerably (from 1 week to 2.7 years), and studies were classified as less than 6 months (n = 63) or more than 6 months (n = 55).

Correlation coefficients for energy and macronutrients
As shown in Table 2, crude ICCs for reproducibility ranged from 0.499 for starch to 0.803 for alcohol (median: 0.667). All values for energy and macronutrients exceeded 0.5. After adjusting for energy, the range of ICC was between 0.420 (n-3 PUFA) and 0.803 (alcohol) with a median value of 0.630. Energy-adjusted ICCs of most nutrients exceeded 0.5 except those for n-3 PUFA, trans-fat, and soluble fiber. For SCCs, all pooled crude values ranged from 0.548 (plant fat) to 0.851 (alcohol) with a median value of 0.637, and energy-adjusted values ranged from 0.441 (n-6 PUFA) to 0.793 (alcohol) with a median value of 0.580. Most values were decreased after adjusting for energy, except those for lipid and plant fat. All pooled crude SCCs exceeded 0.5; energy-adjusted values exceeded 0.5 except those for n-3 PUFA and n-6 PUFA. Heterogeneity was high for energy and most nutrients in crude and energy-adjusted ICCs and SCCs (I 2 > 50%). Table 3 depicts the reproducibility of the FFQ measurements in terms of pooled ICCs and SCCs for micronutrients. For vitamins, the pooled crude and energy-

Subgroup analysis according to age and sex
To assess the impact of age on the degree of reproducibility of two FFQ measures, we performed subgroup analysis according to age (Fig. 2). As shown in Supplemental Based on subgroup analysis according to sex (Fig. 3), pooled ICCs for estimation of 13 of 28 and pooled SCCs  Table 4). Range of SCCs was between 0.374 and 0.872 for men, and between 0.502 and 0.838 for women (Supplemental Table 5).

Factors influencing reproducibility according to study design
The results of pooled ICCs and SCCs for reproducibility stratified according to sample size are presented in Fig. 4. The results of pooled ICCs stratified according to sample size are presented in Supplemental Table 10. The median The values for small sample sizes varied from 0.516 to 0.841, which were higher than those of large sample sizes for most nutrients (28/46) (Supplemental Table 11).
The results of analysis of subgroups by interval time between two measures of FFQs is present in Fig. 5. And we found that a median (range) of pooled ICCs of 0.643 (0.518-0.822) for short-term reproducibility and 0.652 (0.485-0.788) for long-term reproducibility (Supplemental Table 12). SCCs ranged from 0.532 to 0.860 and 0.339 to 0.840 for short-term and long-term reproducibility, respectively (Supplemental Table 13). For participants with a shorter period (≤6 months) between completing FFQs, pooled ICCs of energy and most nutrient (24/40) intake were higher than those for longer periods (> 6 months). Higher SCCs were identified for most nutrients (42/48) for assessment of the short-term reliability of FFQs when compared with those for long-term reliability.
In order to assess the influence of seasons on the reproducibility of FFQs, we conducted a subgroup analysis with 12-month interval as cut-point. For the long-term and short-term reproducibility of FFQ, the pooled ICC was from 0.501 to 0.859 (median = 0.676) and from 0.485 to 0.788 (median = 0.643), respectively (Supplemental Table 14). Compared with the reproducibility of FFQs at long time intervals (≥ 12 months), the ICCs of FFQs reproducibility at short intervals were higher (28/ 40). As shown in Supplemental Table 15, the SCCs of reproducibility of FFQ at long intervals (≥ 12 months) were from 0.339 to 0.848 (median = 0.602) and SCCs of reproducibility of FFQ at short intervals (< 12 months) were from 0.248 to 0.845 (median = 0.632). The SCCs for short-term reproducibility of FFQs were higher for energy and most nutrients (34/49) than long-term reproducibility of FFQs.

Factors influencing reproducibility according to FFQ design
The results of subgroup analyses according to items of FFQ are presented in Fig. 6. For FFQ items, the pooled ICCs between two measures of FFQs with many items (> 120) varied from 0.512 to 0.825, whereas values of FFQs with small items (≤120) ranged from 0.310 to 0.764 (Supplemental Table 16). The pooled SCCs of long FFQs varied from 0.555 to 0.85, while the values of short FFQs ranged from 0.469 to 0.851 (Supplemental Table 17). Compared with those of short FFQs, pooled ICCs and SCCs of long FFQs were higher for 38 of 39 nutrients and 43 of 49 nutrients, respectively. ICCs and SCCs for reproducibility stratified according to dietary recall interval are presented in Fig. 7. The median ICC values were 0.659 (range: 0.557-0.836) for long-term FFQs (≥12 months) and 0.622 (range: 0.310-0.854) for short-term FFQs (< 12 months). SCCs ranged from 0.522 to 0.847 and 0.494 to 0.838 for long-term and short-term FFQs, respectively. The combined ICCs of 24/38 nutrients and SCCs of 20/42 nutrients between repeated long-term FFQs were higher than those for short-term FFQs (Supplemental Table 18 and Table 19). Figure 8 present the difference of correlations between self-administered and interviewer-administered FFQs. Pooled ICCs ranged from 0.530 to 0.811 and 0.502 to 0.826 for the reproducibility of self-administered FFQs and interviewer-administered FFQs, respectively (Supplemental Table 20). In total, values for 17/39 nutrients were higher for self-administered FFQs than for intervieweradministered FFQs. SCCs for the reproducibility of selfadministered FFQs (range: 0.553-0.874) were higher than those for interviewer-administered FFQs (range: 0.482-0.761) for 37 of 43 nutrients (Supplemental Table 21).

Discussion
In the present study, we conducted a meta-analysis to systematically assess the reproducibility of FFQs and to explore the factors related to the reproducibility of FFQs. And the pooled ICCs and SCCs were found exceeded 0.5 for energy and most nutrients in general heathy populations. For the elderly and adolescents, pooled ICCs and SCCs for most nutrients were lower than those in adults (18-50 years old). In terms of energy and 24 macronutrients, all ICC and SCC values exceeded 0.5, except for I, soluble fiber, trans-fat, n-3 PUFA, and n-6 PUFA. Moreover, we identified that FFQs with more food items, 12 months as dietary recall interval, and shorter time periods between repeated FFQs resulted in superior FFQ reproducibility.
To evaluate the ability of FFQs to accurately evaluate long-term dietary intake in different age groups, we conducted subgroup analysis according to age which revealed that the correlations of the reproducibility of FFQs exceeded 0.5 for most nutrients in the elderly, adolescents, and adults, indicating that the reliability of FFQs was relatively consistent across age groups. However, the reproducibility of FFQs for adults was higher than that for the elderly and adolescents for most nutrients. A potential reason for the lower correlation in adolescents was that older individuals may have more established dietary habits than younger individuals [55]. Further, in adolescents, it is more challenging to assess dietary intake levels, particularly for cooking-related ingredients such as spices [123], and to understand abstract concepts of average intake, particularly for seasonal food such as fruits [37], although the ability to self-report food intake in adolescents improves rapidly from 8 years of age [135]. Compared with that in adults, the reproducibility of FFQs in the elderly tended to be poor. Although the elderly have a relatively stable dietary intake, a decline in memory or cognitive function may have contributed to the tendency for poor reproducibility in the elderly [70].
Gender differences in the reproducibility of FFQs were observed in this study. The degree of reproducibility was generally higher in women than in men for most nutrients, suggesting that women have more stable long-term dietary intake than that of men [21]. Generally, women pay more attention to food intake and cook more often [79], which may contribute to the higher reproducibility of FFQs in women.
In addition, we observed that the reproducibility of FFQ was low when the sample size was large. This low correlation was not the true reproducibility coefficients between FFQs, but might be caused by irrelevant factors in the operation process. As the large sample sizes may facilitate the management of more participants; consume time, resources, and effort; induce loss to follow-up and put a burden on researchers. However, a small sample size may limit representativeness, which induces large differences in within-person nutrient intakes, leading to less reliable correlation coefficients and ICCs [122]. Therefore, when conducting FFQ reproducibility research, a sample size with sufficient statistical power is recommended to ensure reproducibility of FFQs, rather than increasing sample size blindly.
FFQs with more items presented better reproducibility for most nutrients, indicating that long FFQs collated more reliable information [136] and enabled better estimations of dietary and nutrient consumption [92]. However, participants require more time to accurately complete the questionnaire and may lose patience, leading to potential biases and, ultimately, data of lower quality [2]. Therefore, to balance reporting errors and reproducibility of FFQs, pilot studies should be performed to explore the appropriate number of FFQ items based on the demographic characteristics of participants. FFQs were used to assess regular dietary habits over extended periods. The correlation coefficients of the study assessing the reproducibility of FFQs over more than a 1-year period were higher than those over short periods for most nutrients. Relatively high correlations for 1 year indicated that FFQs can provide an accurate estimation of long-term dietary habits. The reasons for lower correlations of FFQs over less than 1 year may be related to the seasonal availability of food [130]. In addition, it is useful for researchers to assess the complete dietary intake of participants with 1 year as the reference time for FFQs [130].
The combined correlation coefficients were higher when FFQs were administered over a short period (≤6 months) compared with those over a long time interval (> 6 months), suggesting that shorter intervals between repeated FFQ administrations were a key factor contributing to high reproducibility of FFQs, in accordance with a previous review [2]. A possible explanation for the higher correlations for short-term reproducibility is that it is easier for respondents to remember and replicate their previous FFQ responses accurately when two FFQs are administered closely in time [15]. The difference between the two subgroups may also be because of changes in diet over time [137]. The lower correlation of long-term reproducibility suggested that the participants' usual intake of food may have changed during the study period [134]. Because food intake also exhibits yearly trends [138,139], longer intervals between repeated FFQs were selected to avoid the effects of seasonal or yearly variations in diet [82]. Therefore, before selecting intervals between repeated FFQs, memory bias and seasonal changes in diet should be taken into consideration.
The main strength of this study is that it is the first meta-analysis to comprehensively analyze FFQ reproducibility. Current research is based on a large number of different populations with a wide age range which revealed good reproducibility of nutrient intake, making FFQs suitable for analyzing dietary intake among different subgroups of age, sample size, gender, and region. We comprehensively evaluated the reproducibility of FFQs by analyzing the intake of 50 nutrients, which strengthened the conclusions of this study.
This study has some limitations. First, our screening criteria excluded articles that assessed the effectiveness of specific nutrients, which may have affected our results in different ways. Second, learning ability and lifestyles, such as education level and body mass index, may have influenced FFQ reliability. However, the relevant data were not available in the included articles. Third, we did not evaluate the quality of included studies because there are currently no tools to assess the quality of reproducibility studies for FFQs. Further studies are needed to establish such tools to evaluate the quality of reproducibility studies for FFQ.

Conclusions
In conclusion, FFQs with correlation coefficients greater than 0.5 for most nutrients may be considered a reliable tool to measure dietary intake. In addition, factors related to FFQ design may be associated with the reproducibility of FFQs, such as FFQ items and dietary recall intervals. To increase the reproducibility of FFQs, the following points should be considered before developing FFQs. First, pilot studies are warranted to explore the appropriate number of FFQ items based on the characteristics of the study population. Second, 12 months is suggested as the dietary recall interval. Third, when performing reproducibility studies for FFQs, a sample size with sufficient statistical power, but no larger, is recommended. Availability of data and materials Not applicable.
Ethics approval and consent to participate Not applicable.

Consent for publication
Not applicable.