To the authors' knowledge this review represents the most comprehensive attempt to examine the relationship between self-report and directly measured estimates of adult physical activity in the international literature. Risk of bias was assessed and identified that just over one third of the studies had lower quality based on their description of the methods and external and internal validity. Overall, no clear trends emerged in the over- or underreporting of physical activity by self-report compared to direct methods. However, some results suggest that patterns in the agreement between self-report and direct measures of physical activity may exist, but they are likely to differ depending on the direct methods used for comparison and the sex of the population sampled. Interestingly, findings also identified that studies which categorized physical activity by level of exertion (e.g. light, moderate, vigorous) exhibited a trend wherein these categorized studies saw the mean percent differences between the self-report and direct measures increasing with the higher category levels of intensity (i.e. vigorous physical activity). These larger differences may reflect a problem with self-report measures attempting to capture higher levels of physical activity, or problems with participant interpretation and recall.
Many of the studies tested the relationship between self-report and direct measures by using a correlation coefficient, but this is limited as correlation is only able to measure the strength of the relationship between two variables and cannot assess the level of agreement between them, as well as ignoring any bias in the data . A more useful approach, the Bland-Altman method, provides a means for assessing the level of agreement between self-report and direct measures by deriving the mean difference between the two measures and the limits of agreement. If the two measures possess good agreement and measure the same parameter of physical activity, then the cheaper and less invasive self-report methods may be valid substitutes for direct methods.
A meta-analysis would have allowed us to estimate the overall effect sizes for each of the direct measures and undertake a sensitivity analysis to further understand the degree of bias in the studies. Unfortunately, inconsistent methods and reporting among the studies included made such an analysis methodologically inappropriate. Further research in this area would benefit from greater consistency in the units of reporting and the methods used to facilitate comparisons. For instance, many studies did not report results using the same units, so estimates of agreement between the self-report and direct measures could not be computed. There was also an inconsistency in the number of days measured and the time lag between the self-report and direct measures. It is recommended that authors present their results using the same units for both measures (e.g. minutes/day, kcal/day), that the two measurements assess physical activity for and over the same time period, and that all relevant data including a mean and measurement of variance (i.e. standard deviation, standard error) be included in all reports.
Adhering to consistent reporting criteria would increase the comparability of results across studies and enable the calculation of overall effect sizes. At the population level, over- or underestimation of physical activity prevalence has important implications as these data are used to monitor physical activity trends, determine spending for research and physical activity interventions and programming, and to estimate physical inactivity-related risks of disease. Future studies may wish to refer to the updated Compendium of Physical Activities  which provides a coding scheme to classify physical activity by rate of energy expenditure. The Compendium offers a means to increase the comparability of results between self-report and direct measures, as well as across studies.
A lack of a clear trend amongst the differences between the self-report methods for assessing physical activity and the more robust direct methods is of concern, especially when trying to establish whether the measures could be used interchangeably. There are several possible explanations for the lack of a clear trend in the data. Many self-report instruments (such as the 7-day PAR) may not have the ability to account for activities of less than 10 minutes in duration or those with a level of exertion lower than brisk walking , whereas some of the direct methods (such as DLW) may capture all forms of physical movement. However, it is important to recognize that other direct measures such as accelerometers are unable to capture certain types of activities such as swimming and activities involving the use of upper extremities. Our findings demonstrate the inherent difficulty self-report measures possess when trying to accurately capture data at various levels of exertion. Compared to direct measures, self-report methods appear to estimate greater amounts of higher intensity (i.e. vigorous) physical activities than in the low-to-moderate levels.
Just as with some self-report measures not being able to capture all forms of activity, some direct measures may capture non-physical activity. For instance, the DLW technique is an accurate assessment of total energy expenditure, but it does not only capture physical activity, but rather all forms of energy expenditure including resting energy expenditure and the thermogenic effect of food. DLW is therefore expected to overestimate physical activity unless corrections are made. These and other measurement errors may inflate the between-individual variability in the energy expended in physical activity . Finally, direct methods may be too sensitive to small errors derived from the various calibration methods employed and the equations used to define and categorize physical activity.
It is important to take into account all of these factors when comparing self-report and direct measures of physical activity. In specific circumstances (e.g. at different levels of activity) these two methods may not be comparable as they are not able to capture the same parameters of physical activity. Self-report measures may not able to accurately capture all levels of activity, but they may be able to capture how difficult an individual perceives an activity to be and the type of activity that is undertaken (e.g. leisure, work, transportation). Direct measures, on the other hand, may be more able to capture some of the information not captured in self-report methods (e.g. incidental daily movement and lower intensity activities), but also possess their own limitations such as the inability to capture arm movements and various types of physical activity (e.g. swimming).
Concern regarding the discrepancy between self-reported and directly measured physical activity were recently reported by Troiano and colleagues who examined data from the 2003–2004 National Health and Nutrition Examination Survey (NHANES) which contained the first direct measurements of physical activity in a nationally representative U.S. sample . They compared self-reported adherence estimates of physical activity recommendations with those directly measured by accelerometer. Their findings identified that self-reported adherence estimates were much higher than those measured by accelerometer. The authors hypothesize that the overestimation may be a result of respondents misclassifying sedentary or light activity as moderate or from underestimations of activity duration by the accelerometers.
Other factors, such as those related to the population under study, may influence the ability of self-report and direct methods to capture the same measurement. For example, our findings show that in studies with a focus on overweight/obese individuals, self-reported physical activity was overestimated in all cases except for DLW studies involving combined male/female and male-only data. Our results differed from those reported by Irwin, Ainsworth and Conway (2001) . Their study consisted of 24 males and used DLW to compare energy expenditure estimates with those obtained by physical activity record and the 7-day PAR. The investigators observed an overestimation of energy expenditure in participants with higher body fat using the physical activity record, but not the 7-day PAR. A comparison of the same sample by body mass index (BMI) identified that those with a BMI ≥ 25 kg/m2 overestimated energy expenditure from physical activity records and the 7-day PAR. In confirmation of the trends within our accelerometer data, a recent study (published after our search) of 154 subjects compared a physical activity questionnaire to accelerometry data and identified that the accuracy of the physical activity questionnaire was higher for males than females and for those with a lower BMI . It is likely that a response bias exists due to social desirability, and influences the degree of over-reporting of physical activity by overweight/obese individuals. Future research and synthesis is needed to identify whether a bias does in fact exist and if so, whether it differs by gender, and to what extent.
This review had limitations that should be considered when examining the results. First, the sample was limited to studies that included directly comparable data between self-report and direct measures (same units for both measures) or a comparison by way of correlation. Access to primary data from each study was not feasible; therefore, we relied upon reported comparisons and the means of measured physical activity. This reduced the number of studies with reported measures of physical activity by self-report and direct methods and limited our ability to accurately assess the degree of agreement between the two measures. However, when possible we converted non-comparable units to increase the number of studies used. The review did not assess the agreement between proxy-reported physical activity and direct measures. Proxy-report data are less prevalent but is an important means for assessing physical activity in sub-populations such as those who are chronically ill, disabled, or elderly, and who are unable to self-report on their own physical activity levels. Further research is required to assess the validity of proxy-report measures of physical activity when compared to direct methods. Finally, this review did not discern between differences in study protocols related to calibration, cut-points, or collection of the measurements and other population specific characteristics.