This study compared estimates of domain-specific PA and sedentary behavior obtained with the FPACQ with those obtained from SWD. Furthermore, it was examined whether the correspondence between the two methods varied with gender and age. All parameters of the FPACQ were significantly and positively correlated with SWD-outcomes. Nevertheless, significant differences between both methods were found. In general, PA was higher and sedentary behavior lower with the FPACQ compared to SWD. These results are similar to those of several other studies, which showed that, when compared to objective data obtained from accelerometers, questionnaires have acceptable validity, but generally overestimate PA [13, 20, 21]. However, previous studies are mostly limited to overall PA or time spent at moderate and vigorous activity, whereas the current study highlights the importance of examining domain-specific activity when investigating agreement between measurement techniques.
Correlations between the two methods varied between 0.21 and 0.65 and are similar to what is typically reported for PA questionnaires evaluated in adults [4, 17, 22]. An important contribution of this study is the comparison between subjective and objective measures of physical (in)activity in different domains of daily life. Correlations were moderate for job, leisure time, household chores and transport, but low for eating and sleeping. To our knowledge, only two studies divided accelerometer output into different domains according to the information obtained from an activity log, similar to what was done in the current study. Measures of occupational activity from Tecumseh and Baecke questionnaires were significantly correlated with Tracmor output during work (r = 0.26 to 0.50), but low or no correlations were found for indices of active leisure time. However, active leisure time included a wide range of activities, such as sports, household and garden activities . Matton et al.  showed comparable correlations for active transport (0.49-0.55), but higher correlations for sports (0.47-0.77), TV viewing (0.69-0.83), occupation (0.78-0.88) and eating and sleeping (0.53-0.69). There were, however, subtle differences in the calculation of the FPACQ parameters.
Despite of the significant correlations, PAL and total EE were significantly higher and sedentary time significantly lower with the FPACQ as compared to the SenseWear. These results are consistent with findings from several previous studies [4, 13, 21]. However, it is unclear whether the differences between the two methods are due to errors in the FPACQ or to inherent limitations of the SenseWear. It has been shown that the SenseWear underestimates total EE by 4% compared with doubly labeled water [24, 25]. This could partly explain the observed difference in total EE between FPACQ and SenseWear (9% of the average of FPACQ and SenseWear outcomes).
With regard to intensity of activity, reported duration of vigorous PA was higher, whereas moderate PA was lower than directly measured by the SenseWear. These complex patterns have been seen in several previous studies. It has been reported that people overestimate the amount of vigorous activity, while underestimating time spent in light and moderate activities , though some studies also found an over-reporting of moderate activities [21, 27]. The FPACQ questions of time spent in moderate and vigorous PA inquire about overall activity in multiple domains of daily life. These questions are cognitively challenging, because several activities need to be taken into account and summed over the day . Most subjects, asked about PA behavior, seem to think about vigorous or organized activities and not about routine activities like household chores or walking . This underlines the importance of examining domain-specific activity when investigating agreement between measurement techniques. In the current study, time and EE of job activities and active transport were significantly higher and household chores, passive transport, eating and sleeping significantly lower with the FPACQ as compared to SWD. Furthermore, the FPACQ resulted in lower values for duration, but higher values for EE of sports. Few studies have compared self-reported activity in different domains with similar measures obtained from activity monitors. Matton et al.  showed that duration of eating and sleeping and watching TV in women were significantly lower and time and EE of sport, time of active transport and EE during occupation significantly higher when calculated from the FPACQ as compared to an accelerometer plus log. Reported duration of active leisure time was higher in men and slightly, but not significantly, lower in women. However, active leisure time included sports participation, active transport and house and garden activities. This could possibly point to an underreporting of household activities in women, analogous to the current study.
The correspondence between FPACQ and SWD varied with gender and age. However, no clear pattern was observed. Trends differed according to the specific intensity and domain of activity. Men over-reported more intense activity significantly more than women, whereas women underreported total sedentary time and household chores to a greater extent. Young adults had smaller differences between FPACQ and SWD for PAL and total EE, but greater differences for time spent at moderate activities than middle-aged and older subjects. Additionally, vigorous activities were more over-reported by young compared to middle-aged adults. The evidence on the role of gender in the agreement between self-report and direct measures of PA has been mixed, with some studies demonstrating better agreement in men [4, 30], while others have reported better agreement in women [21, 31]. Calabro et al.  found that for men, the 24-hour recall estimate of total EE was slightly higher than the SenseWear, whereas for women, it was slightly lower. Only a few studies investigated the impact of age in the accuracy of self-reports. It has been reported that PA questionnaires are especially challenging in older adults because of cognitive processes . Furthermore, a substantial component of their PA, namely activities of daily living, is not captured by most self-report instruments . A review of Ferrari et al.  showed that the validity of questionnaires varied with age, with lower coefficients observed for subjects older than 50 years. However, results could differ depending on the questionnaire used .
Bland-Altman analyses revealed a relatively small mean difference between FPACQ and SenseWear for total EE. However, 95% limits of agreement were large, suggesting that there are large individual differences in estimates from both methods. Most of the previous studies have reported agreement at the group level, but not at the individual level [20, 35]. Calabro et al.  found a relatively small (38.5 kcal·day-1), not significant, difference between the 24-hour PA recall and SenseWear for group-level EE. However, differences in individual estimates ranged from -663 to 946 kcal·day-1. In the current study, no systematic bias was observed for total EE. Yet, for sports, a trend towards increased over-reporting by the FPACQ with higher values of EE was found. Other studies also indicated an increased difference with increasing PA. Good agreement existed between IPAQ and ActiGraph up to 1000 min of PA per week. However, as activity levels increased over 1000 min, the IPAQ tended to overestimate total PA . Bland-Altman plots for the 24-hour recall versus the IDEEA and SenseWear illustrated a tendency of the 24-hour recall to underreport total EE in the least active and over-report in the most active subjects .
Several reasons could explain the disagreement between both measurement methods. Social desirability may at least partially explain the over-reporting of PA and underreporting of sedentary pursuits . It has been shown that, over a seven-day period, social desirability bias is associated with over-reporting of PA by approximately 4–11 min·day-1
A higher perceived intensity than objectively measured may also lead to differences [37, 38]. Some questionnaires, including the FPACQ, ask about activities where physiological parameters like increased sweating, heart rate or breathlessness mark the intensity . However, the perception of intensity depends on the age, gender and fitness of the person as well as on duration of activity [1, 2]. Moderate activities could be perceived as vigorous, which may explain the over-reporting of vigorous and underreporting of moderate PA. Likewise, subjects could have overestimated the intensity of their occupational activities, resulting in higher EE in the FPACQ.
A third explanation might be the problems associated with recalling light to moderate activities of daily living. It has been shown that it is difficult to achieve accurate measures of light to moderate PA using self-reports, probably due to their unstructured and intermittent nature . Aadahl et al.  have reported that subjects knew quite accurately how much time they slept, worked or watched TV, and how much time they spent on vigorous activities such as sports or heavy gardening. But, the duration of light activities at home was very difficult to remember. This could explain why particularly women underreported the duration of household chores. It is possible that women performed lighter activities, whereas men performed heavier gardening. Additionally, it may be that women accumulated intermittent household chores over the course of the day, whereas chores of men were more structured, making them easier to recall.
Another source of variability may be the result of algorithms used to convert activity data into EE [10, 11, 31]. The SenseWear estimates EE based on physiological and movement parameters, whereas the FPACQ relies on MET-values from a published compendium . Reported activities were converted into an estimate of EE by assigning each activity a specific MET-value. Thus, a single estimate of the energy cost of a certain activity was used for all subjects. This does not allow for individual differences in EE [1, 2, 18]. However, evidence suggests that there is considerable inter- and intra-individual variability in the energy cost of activities, depending on the person’s sex, age, body mass, movement efficiency and environmental conditions in which the activity is performed [6, 40]. It is remarkable that EE of sports was highly over-reported for men, but not for women. This could point to a potential overestimation of MET-values of certain sports, perhaps those with a higher intensity or those mainly practiced by men. However, it is also known that the SenseWear underestimates EE during very vigorous activities [41, 42].
It is important to recognize that the disagreement is a result of limitations in both methods. The reported disagreement in literature may be related to limitations in the use of accelerometers. Part of the overestimation of PA in self-reports may be explained by activities that are not detected with accelerometers . In addition, the wear-time of accelerometers varies between studies and is generally low, for example minimum 10 hours per day [20, 34]. However, adults could be awake for up to 16 hours. Thus, during some of the time that the subjects were awake, activities were not registered. It is likely that this produced some bias in the data [12, 43]. The SenseWear can address some of these limitations. By combining accelerometry with physiological sensors, it can detect the increased EE associated with cycling, upper body movement, carrying loads and walking on an incline . Moreover, in this study, the wear time was standardized to 24 hours a day. However, the SenseWear is not without limitations. Similar to other activity monitors, it is known to overestimate EE of moderate activities and underestimate (very) vigorous and total EE [25, 41, 42]. Johannsen et al.  have noted that the SenseWear underestimated PA EE by 12.5% compared to estimates derived from doubly labeled water. This may have contributed to the observed differences between FPACQ and SWD for total EE and the EE of sports and active transport. Furthermore, the Armband cannot be worn during water-based activities. However, in this study, a constant MET-value was imputed to account for swimming and showering or bathing. Because of these limitations, under- or overestimation by the FPACQ can neither be confirmed nor refused and real activity levels probably lie between the subjective and objective assessments.
Some results might reflect limitations in the use of the diary. First, participants may forget to record short-during activities, such as active transport, leading to an underreporting of these activities in the diary. Second, contrary to what was expected, screen time was not different between methods. Yet, the pattern is complex, as men underreported and women slightly over-reported screen time. This could be due to the following difference. In the diary, subjects were forced to choose between activities, when several activities were performed simultaneously, whereas in the FPACQ, both activities could be reported. For example, when eating a meal in front of TV, subjects could have inserted eating into the diary, whereas they also counted this period of TV-viewing when answering the screen time question in the FPACQ. Also surprisingly, time spent on sports was lower in the FPACQ, as compared to the diary. Indicated hours of sports participation in the diary might include time devoted to changing, refreshment and socializing . Furthermore, subjects knew they participated in a PA study and were monitored for their activity. Thus, because of a possible Hawthorne effect, participants could have performed more sports than usual, resulting in higher values in the diary. This points to a potential restriction of the study. The FPACQ assessed activity during a usual week, where SWD measured last week activity. This could, at least in part, explain the difference in job time between both methods. Subjects could have been monitored during a week with some vacation days or less work time than usual. However, the interpretation of a usual week is difficult and participants sometimes recall the last 7 days as a usual week [17, 28].
Some other limitations should be considered when evaluating the results of this study. Participants volunteered to take part in the study. This may have led to a selection bias as most participants were highly-educated and had white-collar functions. Accordingly, the generalizability of these findings to the general working population may be restricted. Though, a previous study showed that agreement between self-reported and accelerometer-obtained PA did not differ between educational levels .
The current study investigated whether the correspondence between recalled and direct measures of PA varied with gender and age. However, trends in agreement may be influenced by several other characteristics, including BMI and cardiovascular fitness [27, 31, 37]. Additional research is needed to identify whether, and to what extent, these factors are associated with reporting bias.
A major strength of this study is the combination of the SenseWear Armband, a valid activity monitor [24, 25], with the electronic diary. Each minute of SenseWear data was linked to the diary reported type of activity. In this way, activity variables from the questionnaire could be compared with an objective measure generated in the same dimension, thereby moving beyond examinations of overall PA or time spent at moderate and vigorous intensity. In addition, compared to previous studies examining agreement between measurement techniques , this study included a relatively large sample of men and women of diverse ages. Furthermore, the compliance for wearing the SenseWear and completing the diary was very high and only subjects with at least six days with a minimum of 22 hours and 48 min (95% of 24 hours) of data were included in the analyses.
The current results show that great care must be taken when interpreting self-reported and objectively measured PA. Clearly, the two assessment techniques are not interchangeable. Both instruments capture different aspects of a complex behavior. Activity monitors like the SenseWear, measure motion or movement, while questionnaires provide a behavioral description of activity patterns. As shown previously, subjective and objective methods are independently associated with health parameters, and in that way, self-reports should be used as an addition to objective indicators of movement . Furthermore, it is important to recognize that the current recommendation to accumulate 30 min of PA on most days, is based on associations between self-reported PA and health outcomes. The magnitude of these associations may be severely attenuated by measurement error  and less than 30 min of PA as measured by an accelerometer, may provide significant health benefits . Thus, the benefits of PA may even be greater than what is typically reported.