A systematic review of reliability and objective criterion-related validity of physical activity questionnaires

Helmerhorst, Hendrik Hendrik JF; Brage, Søren; Warren, Janet; Besson, Herve; Ekelund, Ulf

doi:10.1186/1479-5868-9-103

Review
Open access
Published: 31 August 2012

A systematic review of reliability and objective criterion-related validity of physical activity questionnaires

Hendrik Hendrik JF Helmerhorst^1,2,
Søren Brage¹,
Janet Warren^3,4,
Herve Besson¹ &
…
Ulf Ekelund^1,5

International Journal of Behavioral Nutrition and Physical Activity volume 9, Article number: 103 (2012) Cite this article

59k Accesses
454 Citations
3 Altmetric
Metrics details

Abstract

Physical inactivity is one of the four leading risk factors for global mortality. Accurate measurement of physical activity (PA) and in particular by physical activity questionnaires (PAQs) remains a challenge. The aim of this paper is to provide an updated systematic review of the reliability and validity characteristics of existing and more recently developed PAQs and to quantitatively compare the performance between existing and newly developed PAQs.

A literature search of electronic databases was performed for studies assessing reliability and validity data of PAQs using an objective criterion measurement of PA between January 1997 and December 2011. Articles meeting the inclusion criteria were screened and data were extracted to provide a systematic overview of measurement properties. Due to differences in reported outcomes and criterion methods a quantitative meta-analysis was not possible.

In total, 31 studies testing 34 newly developed PAQs, and 65 studies examining 96 existing PAQs were included. Very few PAQs showed good results on both reliability and validity. Median reliability correlation coefficients were 0.62–0.71 for existing, and 0.74–0.76 for new PAQs. Median validity coefficients ranged from 0.30–0.39 for existing, and from 0.25–0.41 for new PAQs.

Although the majority of PAQs appear to have acceptable reliability, the validity is moderate at best. Newly developed PAQs do not appear to perform substantially better than existing PAQs in terms of reliability and validity. Future PAQ studies should include measures of absolute validity and the error structure of the instrument.

Background

Physical inactivity is considered to be one of the four leading risk factors for global mortality [1]. The measurement of physical activity is a challenging and complex procedure. Valid and reliable measures of physical activity (PA) are required to: document the frequency, duration and distribution of PA in defined populations; evaluate the prevalence of individuals meeting health recommendations; examine the effect of various intensities of physical activity on specific health parameters; make cross-cultural comparisons and evaluate the effects of interventions [2].

Physical activity questionnaires (PAQs) are often the most feasible method when assessing PA in large-scale studies, likely because of their low cost and convenience but these instruments have limitations and should be selected and used judiciously. PAQs are prone to measurement error and bias due to misreporting, either deliberate (social desirability bias) or because of cognitive limitations related to recall or comprehension [3, 4]. Cognitive immaturity or degeneration can make self-report of physical activity particularly difficult in the young and elderly [5, 6]. Despite more frequent use of objective assessment methods to measure physical activity, PAQs still provide a practical method for PA assessment in surveillance systems, for risk stratification and when examining etiology of disease in large observational studies. Most PAQs are designed to be able to measure multiple dimensions of PA by reporting type, location, domain and context of the activity, provide estimates of time spent in activities of various levels of intensity, and may be able to rank individuals according to intensity levels of reported activity [7, 8]. However, results from studies aimed at evaluating the validity of PAQs assessed in one population cannot be systematically extrapolated to other populations, ethnic groups, or other geographical regions. Consequently, a great variety of PAQs have been developed and tested for reliability and validity in recent years.

A comprehensive review of PAQs for use in adults was published in 1997 [9]. Since then, reviews summarizing the validity and reliability of PAQs have been carried out in children [10–12] and preschoolers [13]. Recently, specific reviews were published assessing the quality of PAQs available for children [11], adults [14] and the elderly [15]. The aim of the present study was to systematically review the literature on reliability of PAQs as well as their validity evaluated against objective criterion methods, for use in all age groups, published between January 1997 and December 2011 to quantitatively compare the performance between existing and newly developed PAQs.

Methods

Inclusion criteria

Studies meeting all of the following inclusion criteria were included: (i) published in the English language between January 1997 and December 2011; (ii) self- or interviewer-administered PAQs or parental proxy reports reporting both reliability and validity results; (iii) PAQs reporting validity results only, when the reliability data has been published previously; (iv) PAQs developed for a healthy general population and for observational surveillance studies; (v) PAQs tested in its original form or in an adapted version if results were reported for validity and reliability or validity only, when reliability results were published before; (vi) validity tested against an objective criterion measure of PA (i.e. accelerometry, heart rate, combined heart rate and accelerometry, doubly labeled water (DLW)); (vii) results on validity obtained by pedometer where the questionnaire was specifically developed to assess walking only.

Exclusion criteria

We excluded studies that reported: (i) reliability and validity results in groups with specific clinical or medical conditions (except pregnancy); (ii) results from PAQs that were designed for specific intervention studies; (iii) results where the validity of the PAQ was tested against another self-report method (i.e. diaries, logs); (iv); results on validity using pedometers (except if walking only was tested) and indirect measures of physical activity (e.g. VO_2max and body composition); (v) results on essential adaptations of original PAQs, without any published results on both reliability and validity.

Literature search

The PubMed, Medline and Web of Science databases were systematically searched using the following lists and terms:

List A: (physical activity AND health survey OR population survey OR question*)

List B: List B: measure* (i.e. measures, measurement), assess* (i.e. assessment, assessed), self-report, exercise, valid* (i.e. valid, validation, validity), reliab* (i.e. reliable, reliability), reproducible, accelerometer, heart rate, doubly labelled water, doubly labeled water. The search included titles, abstracts, key words and full texts.

Key search terms in List A were combined with each of the terms in List B.

The literature search was undertaken in two stages. The original literature search (1997–2008) was undertaken by two of the authors (JW, HB) independently and search results were compared and verified. The literature search was then updated to include studies up to December 2011 using exactly the same search criteria (HH). A second search strategy included screening references lists of publications that matched the inclusion criteria and any other publications of which the authors were aware but did not show up during the original literature search. Figure 1 displays an overview of the literature search.

Data collection and extraction

Data were extracted using a standardized pro-forma which included sample characteristics, questionnaire details, methods of validity and reliability testing, test results and authors’ conclusions. We retrieved full text of articles of all abstracts that met our inclusion criteria. Any queries about the inclusion of papers were resolved by one of the authors (UE).

Reliability

Reliability in all studies was tested through a test-retest procedure to measure consistency of the PAQs. Reliability results from included studies were reported as: intraclass correlation coefficients (ICC); Pearson and Spearman correlation coefficients; and agreement measures using Cohen’s weighted kappa (κ) and mean differences. Reliability was considered poor, moderate (acceptable), or strong when correlation coefficients or kappa statistics were <0.4, 0.4–0.8 or >0.8, respectively [16]. Similarly, an ICC > 0.70 or >0.90 was considered as acceptable and strong, respectively, in those studies reporting this measure [17].

Medians of reliability correlation coefficients across studies were calculated and included in the tables when possible.

Validity

Correlation coefficients were the most commonly used measures of validity, although the Bland-Altman technique [18] which determines absolute agreement between two measures expressed in the same units, was also frequently used. The Bland-Altman method estimates the mean bias and the 95 % limits of agreement (± 2SD of the difference) and is usually plotted as the difference between the methods against the mean of the methods for visual inspection of the error pattern throughout the measurement range; the dependence of error with the underlying level can be summarised in the error correlation coefficient but this was only seldom reported.

Medians of included validity correlation coefficients were calculated and included in the tables when possible. When calculating the medians, we excluded those studies reporting correlation coefficients for the associations of self-reported sedentary time. The medians for sedentary time are reported separately and associations of sedentary time with measures of total physical activity (i.e. total energy expenditure [TEE], physical activity level [PAL] and total activity from accelerometry [mean counts]) from the criterion method were excluded in these analyses as these measures are expected to be inversely related.

Classification

Questionnaires were classified as new or existing (i.e. previously published test results) PAQ. Existing questionnaires were subdivided into those which reported new reliability and validity results, and those which reported new results on validity only but had previously reported results on reliability. Questionnaires were classified as new, when the concerning study was the first to publish reliability and objective validity data on the PAQ. Hereafter, studies were further stratified for age group of the sample. Study populations with a mean age lower than 18 years were categorised as youth, 18 – 65 years were classified as adults, and elderly above 65 years.

PAQs included

PAQ abbreviations are listed in Table 1, with their respective timeframe. The details of these studies are shown in Tables 2 (new PAQs) and 5 (existing PAQs). A range of tests were used to assess reliability and validity with some studies reporting results for a total questionnaire summary score, and others assessing reliability and validity for various aspects, intensities, or domains of the questionnaire and/or by subgroups within the test population. The total score or index for the PAQ was reported, if available. In the absence of a total score, correlation coefficients by intensity category or group are reported. Where multiple results were reported, a decision was made about the data that constituted the main results based on the stated objectives for the study or questionnaire. Several studies compared results to another questionnaire concurrently but if this was a secondary aim of the specific study, the results were not included.

Table 1 List of questionnaire abbreviations and the corresponding definitions

Full size table

Table 2 Descriptive characteristics of new PAQs

Full size table

Results were reported for both total score and other aspects (e.g. domain, intensity) when this substantially added to the information for the specific study, for example when total PA was tested against a different validation method than PA intensities [31]. Some questionnaires assessed sedentary behaviour and these results are specifically reported in the tables or text. Sedentary behaviour has recently been suggested to be considered distinctively from physical activity in associations with health outcomes [50].

Results

The search string (JW and HH) resulted in a total of 11098 hits. The first literature search resulted in 125 papers being retrieved for data extraction. The update of the literature review to December 2011 resulted in a further 75 papers being retrieved for data extraction (Figure 1). More than half of the papers retrieved were excluded (n = 104). The main reasons for exclusion were inappropriate criterion measures, generally a measure of aerobic fitness (n = 48), and lack of information on reliability (n = 26) or validity (n = 17) (Figure 1).

New PAQs

The description of newly developed PAQs is summarized in Table 2. The literature search found 31 articles, reporting results from 34 newly developed PAQs of which 10 were from the United States, 10 from Europe, six from Australia, two from Canada, and one study from Japan and Sub-Saharan Africa, respectively. Of note was a 12–country international study testing the International Physical Activity Questionnaire (IPAQ) [34]. This questionnaire is available in a short form for surveillance and in a longer form when more detailed physical activity information is collected. Both forms are available in a number of languages. IPAQ has been rigorously tested for reliability and validity and this has been replicated in a number of countries.

Nineteen studies tested the reliability and validity in adults, an additional 11 studies focused on youth [19–29] and one study was performed in Japanese elderly (n = 1) [49]. Most studies (n = 25) included men and women, four studies [26, 30, 32, 35] reported data in women and two studies [37, 38] in men only. The number of participants varied from 30 to 2271, and several studies [19, 20, 29, 31, 33–35, 39–41, 43–47] performed reliability testing in a larger sample than their test of criterion validity. The most common response timeframe was the last seven days, with seven studies [27, 30, 36, 37, 44, 46, 47] using a timeframe covering the last year (Table 1). All PAQs captured some elements of leisure time and recreational activity, although most questionnaires also addressed multiple domains of activity. Sedentary time is also a commonly captured behaviour from the newly developed questionnaires and has been given some extra attention in recent publications and in the current results. Several recent PAQs, such as the EPIC Physical Activity Questionnaire (EPAQ2) and the Recent Physical Activity Questionnaire (RPAQ), aim to measure the totality of physical activity by domains [31, 46, 47, 51]. The final outcome of the majority of PAQs was reported as time-integrated MET values, e.g. MET-min/week.

Reliability

All reliability results for new PAQs are listed in Table 3.

Table 3 Reliability results of new PAQs

Full size table

Reliability was usually reported as ICC (n = 13), Pearson/Spearman correlation (n = 6), kappa statistic (n = 3) or a combination of these statistics (n = 9). Higher reliability coefficients were more often seen in association with shorter periods between test and retest. Poor correlation (ICC or r <0.4) was found only in subcategories of a few PAQs. Median correlations from reported data for recall of sedentary behaviours across all PAQs were acceptable: ICC = 0.68, Spearman r = 0.60, Pearson r = 0.475, kappa = 0.66.

Youth

Median reliability correlations for the youth were as follows: ICC = 0.69, Spearman r = 0.71, Pearson r = 0.80, kappa = 0.53. The Activitygram (ICC = 0.24) [26] and the self-reported CLASS questionnaire (frequency: ICC = 0.36, duration ICC = 0.24) [25] showed fairly low reliability correlations, whereas the MARCA (ICC = 0.93) [52] and both computer and paper versions of the CDPAQ (ICC = 0.91–0.98) [23] demonstrated high reliability.

Adults

Median reliability correlations for adults were as follows: ICC = 0.765, Spearman r = 0.75, Pearson r = 0.74, kappa = 0.655. Reliability was poor for the AQuAA score for adults (ICC = 0.22) [53]. Similarly, reliability coefficients were poor for the HUNT2 [37] components of light (r = 0.17, κ = 0.20) and hard activity (r = 0.17, κ = 0.41). The primary version of this questionnaire (HUNT1), which was designed a decade earlier, however demonstrated high reliability (r = 0.76–0.87, κ = 0.69–0.82) [54]. The majority of the questionnaires showed acceptable to good reliability: KPAS (ICC = 0.82–0.83) [30], RPAQ (ICC = 0.76) [31], PPAQ (ICC = 0.78) [32], IPAQ short (r = 0.76) and long version (r = 0.81) [34], AWAS (ICC = 0.73–0.80) [35], FPACQ (ICC = 0.68–0.80) [22], OPAQ (ICC = 0.78) [42], SBQ (ICC = 0.77-0.85, r = 0.74-0.79) [43], SPAQ (r = 0.998) [39] and SSAAQ (r = 0.95) [44].

Elderly

Median Pearson reliability correlation for the elderly was r = 0.70. The PAQ-EJ was the only new PAQ designed for (Japanese) elderly that reported reliability results and has acceptable recall properties (r = 0.70) [49].

Validity

All validity results for new PAQs are listed in Table 4.

Table 4 Validity results of new PAQs

Full size table

Accelerometry and in particular the ActiGraph accelerometer was the most commonly used criterion method (n = 19), followed by the Caltrac accelerometer (n = 4) and the Polar heart rate monitor (n = 4). DLW was used in one study, where absolute validity was moderate to high for PAEE (r = 0.39) and TEE (r = 0.67) [31]. In general, validity coefficients were considerably lower than reliability coefficients. Median correlations across all PAQs between reported sedentary behaviours and calculated inactivity from objective measures were low: Spearman r = 0.12.

Youth

Median validity correlations for the youth were as follows: Spearman r = 0.22, Pearson r = 0.41. CLASS self- and parental reported physical activity (r = −0.04–0.11) [25] was among the least valid questionnaires for children, although several other PAQs also showed low correlations with objective measures: Pre-PAQ (r = −0.07–0.17) [19], BONES PAS (r = 0.23–0.27) [20], GAQ (r = 0.27–0.29) [26], Fels PAQ (0.11–0.34) [27]. None of the newly developed PAQs for children demonstrated high validity.

Adults

Median validity correlations for adults were as follows: Spearman r = 0.27, Pearson r = 0.28. Highest validity in adults was demonstrated for the SSAAQ when tested against the Caltrac accelerometer (r = 0.60-0.74) [44]. Low validity correlations for total activity or for all subcategories were observed for the HUNT1 (r = 0.03–0.07) [54], and the short EPIC PAQ (r = 0.04), although the main outcome, a 4 category physical activity index, derived from this instrument was significantly associated with objectively measured physical activity energy expenditure (p for trend = 0.003) [47]. A follow-up study in 1941 adults from 10 European countries suggested moderate validity (r = 0.33) of this instrument using physical activity energy expenditure from combined heart rate and movement sensing as the criterion [51].

Rosenberg et al. assessed the validity of sedentary behaviour only, and demonstrated low correlations (partial r = −0.01–0.10) with objectively measured sedentary time (<100 counts/min) by the ActiGraph accelerometer [43].

Elderly

Median Spearman validity correlation for the elderly was r = 0.41. The PAQ-EJ was tested by correlating a total score with MET-min/day calculated from the Kenz Lifecorder accelerometer-based pedometer (r = 0.41) [49].

Existing PAQs

New validity and reliability results for existing PAQs were reported in 35 studies, and 30 studies reported new results on validity only (Table 5). One study is classified as a study testing an existing PAQs, but also reports both validity and reliability data for a new PAQ (SP2PAQ) [55]. Twenty-six of the 65 studies were undertaken in the US with the remaining coming from Australia (n = 5), Sweden (n = 5), China (n = 4), Belgium (n = 3), Spain (n = 3), Canada (n = 2), France (n = 2), Norway (n = 2), Japan (n = 2), Brazil, Portugal, Singapore, South Africa, Turkey, United Kingdom and Vietnam. There were four multi-country studies; three testing the IPAQ modified for adolescents [56, 57] and the EPAQ-s in 9–10 European cities [51]. The GPAQ was tested in diverse sample of nine global countries [58]. Eighteen studies were undertaken in youth [57, 59–74], 12 in elderly [75–86]; and 35 in adults with a few studies including both older adolescents and adults. In 48 studies men and women were combined, 10 studies examined women only [70, 72, 87–93], and seven studies included only men [54, 75, 78, 94–97]. All authors concluded that the questionnaires had shown at least satisfactory results for reliability and validity (see results below); seven studies noted considerable limitations in aspects of their questionnaires [56, 59, 63, 90, 98–100].

Table 5 Descriptive characteristics of existing PAQs

Full size table

Reliability

All reliability results for existing PAQs are listed in Table 6.

Table 6 Reliability results of existing PAQs

Full size table

Most studies examining the reliability of existing PAQs reported reliability as ICC (n = 20), Pearson/Spearman correlation coefficients (n = 8); some studies also used a combination of correlation statistics (n = 7). Similar to the new PAQs, the existing PAQs demonstrated moderate correlations for reliability. Median correlations from reported data for recall of sedentary behaviours were divergent: ICC = 0.76, Spearman r = 0.725, Pearson r = 0.305, kappa = 0.645.

Youth

Median reliability correlations for the youth were as follows: ICC = 0.64, Pearson r = 0.605. The CHASE (ICC = 0.02) and the CPAQ (ICC = 0.25) showed poor test-retest reliability, whereas the reliability was strong for YPAQ (ICC = 0.79–0.86) in the same study [61]. Previous day physical activity recall instruments proved to be highly reliable in children (ICC = 0.98 [60], r = 0.98 [74]).

Adults

Median reliability correlations for adults were as follows: ICC = 0.79, Spearman r = 0.64, kappa = 0.655. The IPAQ-SALVCF (ICC = 0.929) [105], IPAQ long version (r = 0.87–0.90 [108], ICC = 0.93 [110]), IPAQ short version (ICC = 0.79) [99], FPACQ (ICC = 0.77–0.96) [111], KPAS-mod (ICC = 0.76–0.84) [92] and the JPAC (ICC = 0.99) [113] showed acceptable or strong reliability. Notably, the IPAQ-s showed a wide range of results for reliability, with ICCs ranging from 0.27–0.97 for sitting [54, 69, 83, 85, 99, 103, 112], 0.10–0.42 for walking [54, 69], 0.30–0.34 for MPA [54, 69], 0.30–0.62 for VPA [54, 69], and 0.33–0.79 for total PA [83, 85, 99, 103, 112]. For sedentary time the short IPAQ appeared to be the most reliable questionnaire when the test retest duration was short (i.e. 3 days, [ICC = 0.97]) [99]. All existing PAQs for adults reported acceptable to high reliability properties, overall.

Elderly

Median reliability correlations for the elderly were as follows: ICC = 0.65, Spearman r = 0.60, Pearson r = 0.62. Similarly, all existing PAQs for elderly also showed overall acceptable to high reliability, with the PASE (ICC = 0.91) [77], 7DPAR (ICC = 0.89) [78] and CHAMPS-MMSCV (ICC = 0.81–0.89) [79] performing best.

Validity

All validity results for existing PAQs are listed in Table 7.

Table 7 Validity results of existing PAQs

Full size table

Of the 65 studies that report new results for the validity of existing questionnaires, 14 studies [55, 61, 69, 75, 81, 83, 84, 87, 89, 91, 94, 96, 97, 103] tested two or more questionnaires. Forty-five studies used accelerometry as the criterion, and the remaining used DLW (n = 8) [71, 75, 84, 89, 93, 94, 96, 116], pedometry (n = 3) [79, 101, 105], HR monitoring (n = 1) [104], MiniLogger (n = 1) [81] or a combination of methods (n = 5) [51, 60, 61, 74, 114]. Spearman and Pearson correlations were the most commonly used statistical measures for assessing validity; four studies reported 95 % confidence intervals with these correlations [51, 102, 103, 112] and three studies solely reported results using the Bland-Altman levels of agreement method [84, 94, 104]. Median correlations between reported sedentary behaviours and inactivity from objective measures were calculated: Spearman r = 0.23, Pearson r = 0.435.

Youth

Median validity correlations for the youth were as follows: Spearman r = 0.25, Pearson r = 0.38. Many PAQs (SAPAC [59], HBSC [54], IPAQ-s [54], GSQ [70] and GAQ [118]) demonstrated low validity coefficients (r < 0.2) in youth and only one instrument (PDPAR [60]) was regarded as highly valid (r = 0.76) when compared with physical activity assessed by the Caltrac accelerometer.

Adults

Median validity correlations for adults were as follows: Spearman r = 0.30, Pearson r = 0.46. Validity correlations were generally low for most PAQs, except for the FPACQ [111] compared with accelerometry in multiple subcategories (r = 0.39–0.85) and the BAQ (r = 0.68–0.69), FCPQ (r = 0.34–0.61) and TCQ (r = 0.63–0.64) for estimated TEE compared with TEE measured with the DLW method [96]. Pettee-Gabriel et al. compared five different PAQs with accelerometry from the Actigraph accelerometer and showed acceptable validity for all instruments; PMMAQ (r = 0.59–0.60), PWMAQ (r = 0.56–0.60), NHS-PAQ (r = 0.42–0.46), AAS (r = 0.46–0.50), WHI-PAQ (r = 0.45–0.47) [91]. Several studies, including the 7DR-O [87], MAQ [109], CAPS [89], IPAQ [55, 90] and the IPAQ-s [54, 98, 99], demonstrated poor validity.

Elderly

Median validity correlations for the elderly were as follows: Spearman r = 0.40, Pearson r = 0.345. Bonnefoy et al. tested the validity of 10 previously developed well known PAQs using DLW as the criterion measure [75]. The results of this study suggested that the Stanford Usual Activity questionnaire performed best (r = 0.63–0.65). Other studies in elderly generally found low correlations between self-reported PA with objective measures, also demonstrated by the generally weak performances of the YPAS in several studies (r = 0.11–0.61) [75, 76, 81, 83, 84], and PASE in one of the studies (r = 0.16–0.17) [80].

Discussion

This systematic review covered the most recent 15-year period. We identified 31 studies that adequately tested newly developed PAQs for both validity and reliability during this period. This suggests that whilst assessing physical activity by means of objective monitoring has become widespread also when examining population levels of activity [119–121], PAQs remain an active area of research and are now generally considered complementary to any objective measure. Several previous reviews have assessed the reliability and validity of PAQs with a special focus on their overall performance [9], or performance in specific age groups [11, 14, 15]. Conversely, we compared whether newly developed PAQs performed better than older PAQs, as this will inform researchers and practitioners when choosing an existing PAQ or developing a new instrument for assessing physical activity. We therefore comprehensively summarized the results to allow an adequate appraisal of the existing PAQs performance across domains and physical activity intensities.

In concordance with previous reviews [11, 14, 15], very few questionnaires showed acceptable reliability and validity across age groups. Developing new PAQs requires careful consideration of the study design in terms of target population, sample size, age group, recall period, dimension and intensity of PA, relative and absolute validity, standardized quality criteria and appropriate comparison measures. The lack of formulating a priori hypotheses was recently highlighted as a limitation in most studies examining the validity of PAQs [11] and comprehensive key criteria for physical activity and sedentary behaviour validation studies have been proposed [122, 123].

Since the comprehensive review by Kriska and Caspersen [9], it is apparent that more appropriate criterion methods, in particular accelerometry, have been used to test the validity of PAQs. Yet, a considerable number of studies were excluded from the present review due to an inappropriate criterion method (e.g. aerobic fitness). Many studies reported reliability and validity results for existing and well established questionnaires, which suggests that these instruments are still frequently used. Importantly, newly developed PAQs do not seem to perform any better than existing instruments in terms of reliability and validity. Unfortunately, we were not able to conduct a formal meta-analysis due to differences in reported outcomes, different criterion measures and different time frames between questionnaires.

Total energy expenditure (TEE) was frequently used as the outcome measure of the PAQ and the validity scores from these types of instruments are usually high. However, the results from many of these studies should be interpreted carefully. This is because TEE from any self-report incorporates an estimate of resting energy expenditure (REE) generally calculated from body weight, sex and age. REE explains most of the variation in TEE and, consequently, high correlations may be generated when comparing TEE from self-report with measured or estimated TEE from the criterion method. This is particularly problematic when those same predictions of REE are used by both the criterion method and the self-reported calculation of energy expenditure. Therefore, other outputs (e.g. time spent in different intensity levels, physical activity energy expenditure normalised for body size) from the criterion method appear more appropriate to serve as criterion measures. In these studies correlations between the criterion measure and self-reported PA are considerably weaker than those for TEE, although the concerning PAQs may still be considered valid as demonstrated in some studies [31, 116]. The notion of validity, however, is a matter of degree, rather than an all-or-nothing determination.

The validity correlation coefficients from the vast majority of existing and newly developed PAQs were considered poor to moderate and usually only acceptable when results were presented as Pearson or Spearman correlation coefficients. This suggests that most PAQs may be valid for ranking individuals’ behaviour whereas their absolute validity is limited to quantify PA. Although our summary of the correlations in a single median value should be interpreted with caution, we did not observe any substantial difference between newly and existing PAQs. This may suggest that, despite considerable effort, accurate and precise self-report physical activity instruments are still scarce [124]. Many of the newly developed instruments collected information in various domains of physical activity including transportation and housework. Despite this, it appears almost impossible to obtain a valid estimation of a highly variable behaviour such as free-living physical activity by self-report. While results from large scale observational cohort studies have convincingly demonstrated the beneficial effects of self-reported physical activity on various health outcomes including all-cause mortality, coronary and cardiovascular disease morbidity and mortality, some types of cancer, and type 2 diabetes, the detailed dose–response associations are still unknown [125]. Increased sample size is usually considered to improve precision but may not overcome issues about accuracy. Further, a large sample size does not overcome misclassification due to differential measurement error. Therefore, future studies should consider including an objective measure of physical activity in addition to self-report or consider recommendations to reduce self-report error [126].

With few exceptions, most PAQs reviewed showed acceptable to good reliability with only minor differences between existing and newly developed PAQs. The median reliability correlations were acceptable to good in youth (0.64 – 0.65), adults (0.64 – 0.79), and the elderly (0.60 – 0.65) for existing PAQs; and marginally higher for newly developed PAQs in youth (0.69 – 0.80), adults (0.74 – 0.765), and the elderly (0.70). However, only 3 of 11 newly developed PAQs [21, 23, 24] showed consistently good reliability.

For existing PAQs, median validity correlations were poor to acceptable in youth (0.25 – 0.38), adults (0.30 – 0.46), and elderly (0.345 – 0.40); and essentially similar for newly developed PAQs in youth (0.22 – 0.41), adults (0.27 – 0.28), and the elderly (0.41).

Only four of the reviewed questionnaires, the IPAQ-s (existing) [85], the FPACQ (existing) [111], PDPAR (existing) [60] and the RPAR (new) [21] showed acceptable to good results for both reliability and validity. Sedentary behaviour appeared to be one of the most difficult domains to assess with questionnaires as demonstrated by the poor correlations with objectively measured sedentary time, although arguably, there are also limitations of the criterion measures, which contribute to poorer agreement between methods. About one third (n = 11) of the studies reporting data on newly developed PAQs assessed both validity and reliability for sedentary behaviour. 17 and 15 studies reported data on validity and reliability for sedentary behaviour from existing PAQs, respectively.

Accuracy of PA recall may be increased at the second retest administration by an increased physical activity awareness as a result of completing the questionnaire previously [105]. Many of the reviewed studies did not specify details about their reliability testing, making it difficult to distinguish test-retest reliability of the instrument from a measure of stability of physical activity. It is therefore complex to assign the correlations to either the reliability of the instrument or to the stability of the behaviour of the participant. Assessing test-retest reliability for a last seven day PAQ is generally more straight forward compared to a PAQ assessing usual or last year physical activity. This is because when examining the reliability of a last seven days instrument the respondents should be prompted to report their PA during exactly the same week at two different occasions separated in time. However, this must be weighed against administering the test and retest too close in time that the respondent remembers the answers given to the first administration, resulting in inflation of reliability estimates from correlated error. Several other study details than timeframe of recall can be identified to have a marked influence on the study results, such as socio-cultural background, sex, age, literacy, and cognitive abilities.

The DLW method is usually considered the most accurate criterion method available for measuring TEE and PAEE. However, as discussed above, when using the DLW method and other objective methods which provide outputs in TEE as the criterion instrument, individual variability in body weight needs to be considered. It is therefore recommended that data from these methods should be expressed as PAEE, with and without normalisation for body weight in subsequent validation studies. Combined heart rate and movement sensing may be more accurate than either of the methods used alone for measuring time spent at different intensity levels [31]. However, most of the newly developed PAQs used a single accelerometer mounted at the hip as the criterion method, possibly due to its reasonable costs and feasibility in large study groups. Accelerometry also has some inherent limitations including its inability to accurately assess the intensity of specific types such as weight-bearing activities, cycling, and swimming [33]. Further, the choice of somewhat arbitrary cut-off points [127–129] to classify intensities of activity when using accelerometry as a criterion method has been documented before. The use of accelerometers is especially problematic to validate time spent in different intensities of physical activity from PAQs and this also hampers comparison of studies [33]. Usually criterion measures assess overall PA (e.g. time in MVPA, PAEE) which precludes a direct test of the validity of self-reported domain specific activity (e.g. occupation). It is therefore not surprising that some PAQs [e.g. 86] which only asses a specific domain of activity demonstrate low validity when compared with overall physical activity from the criterion instrument. More research is therefore needed to compare time stamped criterion data with domain specific self-reported activity and to develop criterion instruments which can accurately categorise types of activities. Adopting a conceptual framework for physical activity [130] in combination with standardized procedures when developing and validating PAQs [122, 123] is highly recommended.

Pearson and Spearman correlations may not be the most appropriate statistical methods to use for reporting results on the validity of PAQs. ICC is considered a more appropriate method for continuous measures on the same scale, whereas weighted kappa is a better choice of method for categorical measures [131, 132]. When reporting validation results researchers are encouraged to report absolute validity in terms of mean bias with limits of agreement as well as the error structure of the instrument across the measurement range. We noted that many of the newly developed instruments reported results on absolute validity by means of the Bland-Altman method, which is a simple, intuitive and easy to interpret method to analyse assess measurement error [133]. Descriptive details of the study population may be helpful to explain any heterogeneity in the findings from different studies. Researchers can individually interpret all data for quality and applicability.

In summary, we systematically reviewed studies assessing both reliability and validity of PAQs in various domains, across age groups, and with a focus on total PA and sedentary time. PAQs are inherently subject to many limitations and the choice of PAQs should be dictated by the research question and the population under study. Considerations for researchers when using PAQs in practice have been identified and new research should consider including an objective method for assessing physical activity in addition to any self-report [134]. This review has identified a limited number of PAQs that appear to have both acceptable reliability and validity. Newly developed PAQs do not appear to perform substantially better than existing PAQs in terms of reliability and validity.

References

World Health Organization: Global health risks : mortality and burden of disease attributable to selected major risks. 2009, World Health Organization, Geneva
Google Scholar
Wareham NJ, Rennie KL: The assessment of physical activity in individuals and populations: Why try to be more precise about how physical activity is assessed?. Int J Obes. 1998, 22: S30-S38.
Google Scholar
Jobe JB, Mingay DJ: Cognitive research improves questionnaires. Am J Public Health. 1989, 79: 1053-1055. 10.2105/AJPH.79.8.1053.
CAS Google Scholar
Durante R, Ainsworth BE: The recall of physical activity: using a cognitive model of the question-answering process. Med Sci Sports Exerc. 1996, 28: 1282-1291. 10.1097/00005768-199610000-00012.
CAS Google Scholar
Sallis JF: Self-report measures of children's physical activity. J Sch Health. 1991, 61: 215-219. 10.1111/j.1746-1561.1991.tb06017.x.
CAS Google Scholar
Washburn RA: Assessment of physical activity in older adults. Res Q Exerc Sport. 2000, 71: S79-S88.
CAS Google Scholar
Warren JM, Ekelund U, Besson H, Mezzani A, Geladas N, Vanhees L: Assessment of physical activity - a review of methodologies with reference to epidemiological research: a report of the exercise physiology section of the European Association of Cardiovascular Prevention and Rehabilitation. Eur J Cardiovasc Prev Rehabil. 2010, 17: 127-139. 10.1097/HJR.0b013e32832ed875.
Google Scholar
Diet and physical activity measurement toolkit. http://www.dapa-toolkit.mrc.ac.uk/ .
Pereira MA, FitzerGerald SJ, Gregg EW, Joswiak ML, Ryan WJ, Suminski RR, Utter AC, Zmuda JM: A collection of Physical Activity Questionnaires for health-related research. Med Sci Sports Exerc. 1997, 29: S1-S205.
CAS Google Scholar
Kohl HW, Fulton JE, Caspersen CJ: Assessment of physical activity among children and adolescents: A review and synthesis. Prev Med. 2000, 31: S54-S76. 10.1006/pmed.1999.0542.
Google Scholar
Chinapaw MJ, Mokkink LB, van Poppel MN, van Mechelen W, Terwee CB: Physical activity questionnaires for youth: a systematic review of measurement properties. Sports Med. 2010, 40: 539-563. 10.2165/11530770-000000000-00000.
Google Scholar
Adamo KB, Prince SA, Tricco AC, Connor-Gorber S, Tremblay M: A comparison of indirect versus direct measures for assessing physical activity in the pediatric population: a systematic review. Int J Pediatr Obes. 2009, 4: 2-27. 10.1080/17477160802315010.
Google Scholar
Oliver M, Schofield GM, Kolt GS: Physical activity in preschoolers: understanding prevalence and measurement issues. Sports Med. 2007, 37: 1045-1070. 10.2165/00007256-200737120-00004.
Google Scholar
van Poppel MN, Chinapaw MJ, Mokkink LB, van Mechelen W, Terwee CB: Physical activity questionnaires for adults: a systematic review of measurement properties. Sports Med. 2010, 40: 565-600. 10.2165/11531930-000000000-00000.
Google Scholar
Forsen L, Loland NW, Vuillemin A, Chinapaw MJ, van Poppel MN, Mokkink LB, van Mechelen W, Terwee CB: Self-administered physical activity questionnaires for the elderly: a systematic review of measurement properties. Sports Med. 2010, 40: 601-623. 10.2165/11531350-000000000-00000.
Google Scholar
Streiner DL, Norman GR: Health measurement scales : a practical guide to their development and use. 2003, Oxford University Press, Oxford; New York, 3
Google Scholar
Scientific Advisory Committee of the Medical Outcomes Trust: Assessing health status and quality-of-life instruments: attributes and review criteria. Qual Life Res. 2002, 11: 193-205. 10.1023/A:1015291021312.
Google Scholar
Bland JM, Altman DG: Measuring agreement in method comparison studies. Stat Methods Med Res. 1999, 8: 135-160. 10.1191/096228099673819272.
CAS Google Scholar
Dwyer GM, Hardy LL, Peat JK, Baur LA: The validity and reliability of a home environment preschool-age physical activity questionnaire (Pre-PAQ). Int J Behav Nutr Phys Act. 2011, 8: 86-10.1186/1479-5868-8-86.
Google Scholar
Economos CD, Hennessy E, Sacheck JM, Shea MK, Naumova EN: Development and testing of the BONES physical activity survey for young children. BMC Musculoskelet Disord. 2010, 11: 195-10.1186/1471-2474-11-195.
Google Scholar
Martinez-Gomez D, Calabro MA, Welk GJ, Marcos A, Veiga OL: Reliability and validity of a school recess physical activity recall in Spanish youth. Pediatr Exerc Sci. 2010, 22: 218-230.
Google Scholar
Philippaerts RM, Matton L, Wijndaele K, Balduck AL, De Bourdeaudhuij I, Lefevre J: Validity of a physical activity computer questionnaire in 12-to 18-year-old boys and girls. Int J Sports Med. 2006, 27: 131-136. 10.1055/s-2005-837619.
CAS Google Scholar
Ridley K, Dollman J, Olds T: Development and Validation of a Computer Delivered Physical Activity Questionnaire (CDPAQ) for Children. Pediatr Exerc Sci. 2001, 13: 35-46.
Google Scholar
Ridley K, Olds TS, Hill A: The Multimedia Activity Recall for Children and Adolescents (MARCA): development and evaluation. Int J Behav Nutr Phys Act. 2006, 3: 10-10.1186/1479-5868-3-10.
Google Scholar
Telford A, Salmon J, Jolley D, Crawford D: Reliability and validity of physical activity questionnaires for children: The Children's Leisure Activities Study Survey (CLASS). Pediatr Exerc Sci. 2004, 16: 64-78.
Google Scholar
Treuth MS, Sherwood NE, Butte NF, McClanahan B, Obarzanek E, Zhou A, Ayers C, Adolph A, Jordan J, Jacobs DR, Rochon J: Validity and reliability of activity measures in African-American girls for GEMS. Med Sci Sports Exerc. 2003, 35: 532-539. 10.1249/01.MSS.0000053702.03884.3F.
Google Scholar
Treuth MS, Hou N, Young DR, Maynard LM: Validity and reliability of the Fels physical activity questionnaire for children. Med Sci Sports Exerc. 2005, 37: 488-495. 10.1249/01.MSS.0000155392.75790.83.
Google Scholar
Welk GJ, Wickel E, Peterson M, Heitzler CD, Fulton JE, Potter LD: Reliability and validity of questions on the youth media campaign longitudinal survey. Med Sci Sports Exerc. 2007, 39: 612-621. 10.1249/mss.0b013e3180305c59.
Google Scholar
Wong SL, Leatherdale ST, Manske SR: Reliability and validity of a school-based physical activity questionnaire. Med Sci Sports Exerc. 2006, 2006 (38): 1593-1600.
Google Scholar
Ainsworth BE, Sternfeld B, Richardson MT, Jackson K: Evaluation of the kaiser physical activity survey in women. Med Sci Sports Exerc. 2000, 32: 1327-1338. 10.1097/00005768-200007000-00022.
CAS Google Scholar
Besson H, Brage S, Jakes RW, Ekelund U, Wareham NJ: Estimating physical activity energy expenditure, sedentary time, and physical activity intensity by self-report in adults. Am J Clin Nutr. 2010, 91: 106-114. 10.3945/ajcn.2009.28432.
CAS Google Scholar
Chasan-Taber L, Schmidt MD, Roberts DE, Hosmer D, Markenson G, Freedson PS: Development and validation of a Pregnancy Physical Activity Questionnaire. Med Sci Sports Exerc. 2004, 36: 1750-1760. 10.1249/01.MSS.0000142303.49306.0D.
Google Scholar
Chinapaw MJ, Slootmaker SM, Schuit AJ, van Zuidam M, van Mechelen W: Reliability and validity of the Activity Questionnaire for Adults and Adolescents (AQuAA). BMC Med Res Methodol. 2009, 9: 58-10.1186/1471-2288-9-58.
Google Scholar
Craig CL, Marshall AL, Sjostrom M, Bauman AE, Booth ML, Ainsworth BE, Pratt M, Ekelund U, Yngve A, Sallis JF, Oja P: International physical activity questionnaire: 12-country reliability and validity. Med Sci Sports Exerc. 2003, 35: 1381-1395. 10.1249/01.MSS.0000078924.61453.FB.
Google Scholar
Fjeldsoe BS, Marshall AL, Miller YD: Measurement properties of the Australian Women's Activity Survey. Med Sci Sports Exerc. 2009, 41: 1020-1033. 10.1249/MSS.0b013e31819461c2.
Google Scholar
Friedenreich CM, Courneya KS, Neilson HK, Matthews CE, Willis G, Irwin M, Troiano R, Ballard-Barbash R: Reliability and validity of the Past Year Total Physical Activity Questionnaire. Am J Epidemiol. 2006, 163: 959-970. 10.1093/aje/kwj112.
Google Scholar
Kurtze N, Rangul V, Hustvedt BE, Flanders WD: Reliability and validity of self-reported physical activity in the Nord-Trondelag Health Study (HUNT 2). Eur J Epidemiol. 2007, 22: 379-387. 10.1007/s10654-007-9110-9.
Google Scholar
Kurtze N, Rangul V, Hustvedt BE, Flanders WD: Reliability and validity of self-reported physical activity in the Nord-Trondelag Health Study: HUNT 1. Scand J Public Health. 2008, 36: 52-61. 10.1177/1403494807085373.
Google Scholar
Lowther M, Mutrie N, Loughlan C, McFarlane C: Development of a Scottish physical activity questionnaire: a tool for use in physical activity interventions. Br J Sports Med. 1999, 33: 244-249. 10.1136/bjsm.33.4.244.
CAS Google Scholar
Mader U, Martin BW, Schutz Y, Marti B: Validity of four short physical activity questionnaires in middle-aged persons. Med Sci Sports Exerc. 2006, 38: 1255-1266. 10.1249/01.mss.0000227310.18902.28.
Google Scholar
Meriwether RA, McMahon PM, Islam N, Steinmann WC: Physical activity assessment: validation of a clinical assessment tool. Am J Prev Med. 2006, 31: 484-491. 10.1016/j.amepre.2006.08.021.
Google Scholar
Reis JP, Dubose KD, Ainsworth BE, Macera CA, Yore MM: Reliability and validity of the occupational physical activity questionnaire. Med Sci Sports Exerc. 2005, 37: 2075-2083. 10.1249/01.mss.0000179103.20821.00.
Google Scholar
Rosenberg DE, Norman GJ, Wagner N, Patrick K, Calfas KJ, Sallis JF: Reliability and validity of the Sedentary Behavior Questionnaire (SBQ) for adults. J Phys Act Health. 2010, 7: 697-705.
Google Scholar
Sobngwi E, Mbanya JC, Unwin NC, Aspray TJ, Alberti KG: Development and validation of a questionnaire for the assessment of physical activity in epidemiological studies in Sub-Saharan Africa. Int J Epidemiol. 2001, 30: 1361-1368. 10.1093/ije/30.6.1361.
CAS Google Scholar
Timperio A, Salmon J, Crawford D: Validity and reliability of a physical activity recall instrument among overweight and non-overweight men and women. J Sci Med Sport. 2003, 6: 477-491. 10.1016/S1440-2440(03)80273-6.
CAS Google Scholar
Wareham NJ, Jakes RW, Rennie KL, Mitchell J, Hennings S, Day NE: Validity and repeatability of the EPIC-Norfolk Physical Activity Questionnaire. Int J Epidemiol. 2002, 31: 168-174. 10.1093/ije/31.1.168.
Google Scholar
Wareham NJ, Jakes RW, Rennie KL, Schuit J, Mitchell J, Hennings S, Day NE: Validity and repeatability of a simple index derived from the short physical activity questionnaire used in the European Prospective Investigation into Cancer and Nutrition (EPIC) study. Public Health Nutr. 2003, 6: 407-413.
Google Scholar
Yore MM, Ham SA, Ainsworth BE, Kruger J, Reis JP, Kohl HW, Macera CA: Reliability and validity of the instrument used in BRFSS to assess physical activity. Med Sci Sports Exerc. 2007, 39: 1267-1274. 10.1249/mss.0b013e3180618bbe.
Google Scholar
Yasunaga A, Park H, Watanabe E, Togo F, Park S, Shephard RJ, Aoyagi Y: Development and evaluation of the physical activity questionnaire for elderly Japanese: the Nakanojo study. J Aging Phys Act. 2007, 15: 398-411.
Google Scholar
Owen N, Leslie E, Salmon J, Fotheringham MJ: Environmental determinants of physical activity and sedentary behavior. Exerc Sport Sci Rev. 2000, 28: 153-158.
CAS Google Scholar
The InterAct Consortium: Validity of a short questionnaire to assess physical activity in 10 European countries. Eur J Epidemiol. 2011, 27: 15-25.
Google Scholar
Olds T, Ridley K, Dollman J: Screenieboppers and extreme screenies: the place of screen time in the time budgets of 10–13 year-old Australian children. Aust N Z J Public Health. 2006, 30: 137-142. 10.1111/j.1467-842X.2006.tb00106.x.
Google Scholar
Brown H, Hume C, Chin AM: Validity and reliability of instruments to assess potential mediators of children's physical activity: A systematic review. J Sci Med Sport. 2009, 12: 539-548. 10.1016/j.jsams.2009.01.002.
Google Scholar
Kurtze N, Rangul V, Hustvedt BE: Reliability and validity of the international physical activity questionnaire in the Nord-Trondelag health study (HUNT) population of men. BMC Med Res Methodol. 2008, 8: 63-10.1186/1471-2288-8-63.
Google Scholar
Nang EE, Gitau Ngunjiri SA, Wu Y, Salim A, Tai ES, Lee J, Van Dam RM: Validity of the International Physical Activity Questionnaire and the Singapore Prospective Study Program physical activity questionnaire in a multiethnic urban Asian population. BMC Med Res Methodol. 2011, 11: 141-10.1186/1471-2288-11-141.
Google Scholar
Hagstromer M, Bergman P, De Bourdeaudhuij I, Ortega FB, Ruiz JR, Manios Y, Rey-Lopez JP, Phillipp K, von Berlepsch J, Sjostrom M: Concurrent validity of a modified version of the International Physical Activity Questionnaire (IPAQ-A) in European adolescents: The HELENA Study. Int J Obes. 2008, 32 (Suppl 5): S42-S48.
Google Scholar
Ottevaere C, Huybrechts I, De Bourdeaudhuij I, Sjostrom M, Ruiz JR, Ortega FB, Hagstromer M, Widhalm K, Molnar D, Moreno LA, et al: Comparison of the IPAQ-A and actigraph in relation to VO2max among European adolescents: the HELENA study. J Sci Med Sport. 2011, 14: 317-324. 10.1016/j.jsams.2011.02.008.
Google Scholar
Bull FC, Maslin TS, Armstrong T: Global physical activity questionnaire (GPAQ): nine country reliability and validity study. J Phys Act Health. 2009, 6: 790-804.
Google Scholar
Affuso O, Stevens J, Catellier D, McMurray RG, Ward DS, Lytle L, Sothern MS, Young DR: Validity of self-reported leisure-time sedentary behavior in adolescents. J Negat Results Biomed. 2011, 10: 2-10.1186/1477-5751-10-2.
Google Scholar
Allor KM, Pivarnik JM: Stability and convergent validity of three physical activity assessments. Med Sci Sports Exerc. 2001, 33: 671-676. 10.1097/00005768-200107001-00005.
CAS Google Scholar
Corder K, van Sluijs EM, Wright A, Whincup P, Wareham NJ, Ekelund U: Is it possible to assess free-living physical activity and energy expenditure in young people by self-report?. Am J Clin Nutr. 2009, 89: 862-870. 10.3945/ajcn.2008.26739.
CAS Google Scholar
Eisenmann JC, Milburn N, Jacobsen L, Moore SJ: Reliability and convergent validity of the Godin Leisure-Time Exercise Questionnaire in rural 5th-grade school-children. J Human Movement Studies. 2002, 43: 135-149.
Google Scholar
Gwynn JD, Hardy LL, Wiggers JH, Smith WT, D'Este CA, Turner N, Cochrane J, Barker DJ, Attia JR: The validation of a self-report measure and physical activity of Australian Aboriginal and Torres Strait Islander and non-Indigenous rural children. Aust N Z J Public Health. 2010, 34 (Suppl 1): S57-S65.
Google Scholar
Huang YJ, Wong SH, Salmon J: Reliability and validity of the modified Chinese version of the Children's Leisure Activities Study Survey (CLASS) questionnaire in assessing physical activity among Hong Kong children. Pediatr Exerc Sci. 2009, 21: 339-353.
CAS Google Scholar
Kowalski KC, Crocker PRE, Faulkner RA: Validation of the physical activity questionnaire for older children. Pediatr Exerc Sci. 1997, 9: 174-186.
Google Scholar
Martinez-Gomez D, Warnberg J, Welk GJ, Sjostrom M, Veiga OL, Marcos A: Validity of the Bouchard activity diary in Spanish adolescents. Public Health Nutr. 2010, 13: 261-268. 10.1017/S1368980009990681.
Google Scholar
Martinez-Gomez D, Gomez-Martinez S, Warnberg J, Welk GJ, Marcos A, Veiga OL: Convergent validity of a questionnaire for assessing physical activity in Spanish adolescents with overweight. Med Clin (Barc). 2011, 136: 13-15. 10.1016/j.medcli.2010.05.013.
Google Scholar
Mota J, Santos P, Guerra S, Ribeiro JC, Duarte JA, Sallis JF: Validation of a physical activity self-report questionnaire in a Portuguese pediatric population. Pediatr Exerc Sci. 2002, 14: 269-276.
Google Scholar
Rangul V, Holmen TL, Kurtze N, Cuypers K, Midthjell K: Reliability and validity of two frequently used self-administered physical activity questionnaires in adolescents. BMC Med Res Methodol. 2008, 8: 47-10.1186/1471-2288-8-47.
Google Scholar
Scerpella TA, Tuladhar P, Kanaley JA: Validation of the Godin-Shephard questionnaire in prepubertal girls. Med Sci Sports Exerc. 2002, 34: 845-850. 10.1097/00005768-200205000-00018.
Google Scholar
Slinde F, Arvidsson D, Sjoberg A, Rossander-Hulthen L: Minnesota leisure time activity questionnaire and doubly labeled water in adolescents. Med Sci Sports Exerc. 2003, 35: 1923-1928. 10.1249/01.MSS.0000093608.95629.85.
Google Scholar
Treuth MS, Sherwood NE, Baranowski T, Butte NF, Jacobs DR, McClanahan B, Gao S, Rochon J, Zhou A, Robinson TN, et al: Physical activity self-report and accelerometry measures from the Girls health Enrichment Multi-site Studies. Prev Med. 2004, 38 (Suppl): S43-S49.
Google Scholar
Troped PJ, Wiecha JL, Fragala MS, Matthews CE, Finkelstein DM, Kim J, Peterson KE: Reliability and validity of YRBS physical activity items among middle school students. Med Sci Sports Exerc. 2007, 39: 416-425. 10.1249/mss.0b013e31802d97af.
Google Scholar
Weston AT, Petosa R, Pate RR: Validation of an instrument for measurement of physical activity in youth. Med Sci Sports Exerc. 1997, 29: 138-143.
CAS Google Scholar
Bonnefoy M, Normand S, Pachiaudi C, Lacour JR, Laville M, Kostka T: Simultaneous validation of ten physical activity questionnaires in older men: a doubly labeled water study. J Am Geriatr Soc. 2001, 49: 28-35. 10.1046/j.1532-5415.2001.49006.x.
CAS Google Scholar
De Abajo S, Larriba R, Marquez S: Validity and reliability of the Yale Physical Activity Survey in Spanish elderly. J Sports Med Phys Fitness. 2001, 41: 479-485.
CAS Google Scholar
Dinger MK, Oman RF, Taylor EL, Vesely SK, Able J: Stability and convergent validity of the Physical Activity Scale for the Elderly (PASE). J Sports Med Phys Fitness. 2004, 44: 186-192.
CAS Google Scholar
Dubbert PM, Vander Weg MW, Kirchner KA, Shaw B: Evaluation of the 7-day physical activity recall in urban and rural men. Med Sci Sports Exerc. 2004, 36: 1646-1654. 10.1249/01.MSS.0000139893.65189.F2.
Google Scholar
Giles K, Marshall AL: Repeatability and accuracy of CHAMPS as a measure of physical activity in a community sample of older Australian adults. J Phys Act Health. 2009, 6: 221-229.
Google Scholar
Hagiwara A, Ito N, Sawai K, Kazuma K: Validity and reliability of the Physical Activity Scale for the Elderly (PASE) in Japanese elderly people. Geriatr Gerontol Int. 2008, 8: 143-151. 10.1111/j.1447-0594.2008.00463.x.
Google Scholar
Harada ND, Chiu V, King AC, Stewart AL: An evaluation of three self-report physical activity instruments for older adults. Med Sci Sports Exerc. 2001, 33: 962-970. 10.1097/00005768-200106000-00016.
CAS Google Scholar
Hurtig-Wennlof A, Hagstromer M, Olsson LA: The International Physical Activity Questionnaire modified for the elderly: aspects of validity and feasibility. Public Health Nutr. 2010, 13: 1847-1854. 10.1017/S1368980010000157.
Google Scholar
Kolbe-Alexander TL, Lambert EV, Harkins JB, Ekelund U: Comparison of two methods of measuring physical activity in South African older adults. J Aging Phys Act. 2006, 14: 98-114.
Google Scholar
Starling RD, Matthews DE, Ades PA, Poehlman ET: Assessment of physical activity in older individuals: a doubly labeled water study. J Appl Physiol. 1999, 86: 2090-2096.
CAS Google Scholar
Tomioka K, Iwamoto J, Saeki K, Okamoto N: Reliability and validity of the International Physical Activity Questionnaire (IPAQ) in elderly adults: the Fujiwara-kyo Study. J Epidemiol. 2011, 21: 459-465. 10.2188/jea.JE20110003.
Google Scholar
Washburn RA, Ficker JL: Physical Activity Scale for the Elderly (PASE): the relationship with activity measured by a portable accelerometer. J Sports Med Phys Fitness. 1999, 39: 336-340.
CAS Google Scholar
Ainsworth BE, Richardson MT, Jacobs DR, Leon AS, Sternfeld B: Accuracy of recall of occupational physical activity by questionnaire. J Clin Epidemiol. 1999, 52: 219-227. 10.1016/S0895-4356(98)00158-9.
CAS Google Scholar
Brown WJ, Burton NW, Marshall AL, Miller YD: Reliability and validity of a modified self-administered version of the Active Australia physical activity survey in a sample of mid-age women. Aust N Z J Public Health. 2008, 32: 535-541. 10.1111/j.1753-6405.2008.00305.x.
Google Scholar
Mahabir S, Baer DJ, Giffen C, Clevidence BA, Campbell WS, Taylor PR, Hartman TJ: Comparison of energy expenditure estimates from 4 physical activity questionnaires with doubly labeled water estimates in postmenopausal women. Am J Clin Nutr. 2006, 84: 230-236.
CAS Google Scholar
Nicaise V, Marshall S, Ainsworth BE: Domain-specific physical activity and self-report bias among low-income Latinas living in San Diego County. J Phys Act Health. 2011, 8: 881-890.
Google Scholar
Pettee Gabriel K, McClain JJ, Lee CD, Swan PD, Alvar BA, Mitros MR, Ainsworth BE: Evaluation of physical activity measures used in middle-aged women. Med Sci Sports Exerc. 2009, 41: 1403-1412. 10.1249/MSS.0b013e31819b2482.
Google Scholar
Schmidt MD, Freedson PS, Pekow P, Roberts D, Sternfeld B, Chasan-Taber L: Validation of the Kaiser Physical Activity Survey in pregnant women. Med Sci Sports Exerc. 2006, 38: 42-50.
Google Scholar
Staten LK, Taren DL, Howell WH, Tobar M, Poehlman ET, Hill A, Reid PM, Ritenbaugh C: Validation of the Arizona Activity Frequency Questionnaire using doubly labeled water. Med Sci Sports Exerc. 2001, 33: 1959-1967. 10.1097/00005768-200111000-00024.
CAS Google Scholar
Conway JM, Seale JL, Jacobs DR, Irwin ML, Ainsworth BE: Comparison of energy expenditure estimates from doubly labeled water, a physical activity questionnaire, and physical activity records. Am J Clin Nutr. 2002, 75: 519-525.
CAS Google Scholar
Ekelund U, Sepp H, Brage S, Becker W, Jakes R, Hennings M, Wareham NJ: Criterion-related validity of the last 7-day, short form of the International Physical Activity Questionnaire in Swedish adults. Public Health Nutr. 2006, 9: 258-265.
Google Scholar
Philippaerts RM, Westerterp KR, Lefevre J: Doubly labelled water validation of three physical activity questionnaires. Int J Sports Med. 1999, 20: 284-289. 10.1055/s-2007-971132.
CAS Google Scholar
Philippaerts RM, Westerterp KR, Lefevre J: Comparison of two questionnaires with a tri-axial accelerometer to assess physical activity patterns. Int J Sports Med. 2001, 22: 34-39. 10.1055/s-2001-11359.
CAS Google Scholar
Lee PH, Yu YY, McDowell I, Leung GM, Lam TH, Stewart SM: Performance of the international physical activity questionnaire (short form) in subgroups of the Hong Kong chinese population. Int J Behav Nutr Phys Act. 2011, 8: 81-10.1186/1479-5868-8-81.
Google Scholar
Macfarlane DJ, Lee CC, Ho EY, Chan KL, Chan DT: Reliability and validity of the Chinese version of IPAQ (short, last 7 days). J Sci Med Sport. 2007, 10: 45-51.
Google Scholar
Richardson MT, Ainsworth BE, Jacobs DR, Leon AS: Validation of the stanford 7-Day Recall to assess habitual physical activity. Ann Epidemiol. 2001, 11: 145-153. 10.1016/S1047-2797(00)00190-3.
CAS Google Scholar
Bassett DR, Cureton AL, Ainsworth BE: Measurement of daily walking distance-questionnaire versus pedometer. Med Sci Sports Exerc. 2000, 32: 1018-1023.
Google Scholar
Cust AE, Smith BJ, Chau J, van der Ploeg HP, Friedenreich CM, Armstrong BK, Bauman A: Validity and repeatability of the EPIC physical activity questionnaire: a validation study using accelerometers as an objective measure. Int J Behav Nutr Phys Act. 2008, 5: 33-10.1186/1479-5868-5-33.
Google Scholar
Cust AE, Armstrong BK, Smith BJ, Chau J, van der Ploeg HP, Bauman A: Self-reported confidence in recall as a predictor of validity and repeatability of physical activity questionnaire data. Epidemiology. 2009, 20: 433-441. 10.1097/EDE.0b013e3181931539.
Google Scholar
Duncan GE, Sydeman SJ, Perri MG, Limacher MC, Martin AD: Can sedentary adults accurately recall the intensity of their physical activity?. Prev Med. 2001, 33: 18-26. 10.1006/pmed.2001.0847.
CAS Google Scholar
Gauthier AP, Lariviere M, Young N: Psychometric properties of the IPAQ: a validation study in a sample of northern Franco-Ontarians. J Phys Act Health. 2009, 6 (Suppl 1): S54-S60.
Google Scholar
Hagstromer M, Oja P, Sjostrom M: The International Physical Activity Questionnaire (IPAQ): a study of concurrent and construct validity. Public Health Nutr. 2006, 9: 755-762.
Google Scholar
Hagstromer M, Ainsworth BE, Oja P, Sjostrom M: Comparison of a subjective and an objective measure of physical activity in a population sample. J Phys Act Health. 2010, 7: 541-550.
Google Scholar
Hallal PC, Simoes E, Reichert FF, Azevedo MR, Ramos LR, Pratt M, Brownson RC: Validity and reliability of the telephone-administered international physical activity questionnaire in Brazil. J Phys Act Health. 2010, 7: 402-409.
Google Scholar
Jacobi D, Charles MA, Tafflet M, Lommez A, Borys JM, Oppert JM: Relationships of self-reported physical activity domains with accelerometry recordings in French adults. Eur J Epidemiol. 2009, 24: 171-179. 10.1007/s10654-009-9329-8.
Google Scholar
Macfarlane D, Chan A, Cerin E: Examining the validity and reliability of the Chinese version of the International Physical Activity Questionnaire, long form (IPAQ-LC). Public Health Nutr. 2011, 14: 443-450. 10.1017/S1368980010002806.
Google Scholar
Matton L, Wijndaele K, Duvigneaud N, Duquet W, Philippaerts R, Thomis M, Lefevre J: Reliability and validity of the Flemish physical activity computerized questionnaire in adlults. Res Q Exerc Sport. 2007, 78: 293-306. 10.5641/193250307X13082505157968.
Google Scholar
Saglam M, Arikan H, Savci S, Inal-Ince D, Bosnak-Guclu M, Karabulut E, Tokgozoglu L: International physical activity questionnaire: reliability and validity of the Turkish version. Percept Mot Skills. 2010, 111: 278-284. 10.2466/06.08.PMS.111.4.278-284.
Google Scholar
Smitherman TA, Dubbert PM, Grothe KB, Sung JH, Kendzor DE, Reis JP, Ainsworth BE, Newton RL, Lesniak KT, Taylor HA: Validation of the Jackson Heart Study Physical Activity Survey in African Americans. J Phys Act Health. 2009, 6 (Suppl 1): S124-S132.
Google Scholar
Strath SJ, Bassett DR, Swartz AM: Comparison of the college alumnus questionnaire physical activity index with objective monitoring. Ann Epidemiol. 2004, 14: 409-415. 10.1016/j.annepidem.2003.07.001.
Google Scholar
Trinh OT, Nguyen ND, van der Ploeg HP, Dibley MJ, Bauman A: Test-retest repeatability and relative validity of the Global Physical Activity Questionnaire in a developing country context. J Phys Act Health. 2009, 6 (Suppl 1): S46-S53.
Google Scholar
Washburn RA, Jacobsen DJ, Sonko BJ, Hill JO, Donnelly JE: The validity of the Stanford seven-day physical activity recall in young adults. Med Sci Sports Exerc. 2003, 35: 1374-1380. 10.1249/01.MSS.0000079081.08476.EA.
Google Scholar
Wolin KY, Heil DP, Askew S, Matthews CE, Bennett GG: Validation of the International Physical Activity Questionnaire-Short among Blacks. J Phys Act Health. 2008, 5: 746-760.
Google Scholar
Masse LC, Fuemmeler BF, Anderson CB, Matthews CE, Trost SG, Catellier DJ, Treuth M: Accelerometer data reduction: A comparison of four reduction algorithms on select outcome variables. Med Sci Sports Exerc. 2005, 37: S544-S554. 10.1249/01.mss.0000185674.09066.8a.
Google Scholar
Troiano RP, Berrigan D, Dodd KW, Masse LC, Tilert T, McDowell M: Physical activity in the United States measured by accelerometer. Med Sci Sports Exerc. 2008, 40: 181-188.
Google Scholar
Baptista F, Santos DA, Silva AM, Mota J, Santos R, Vale S, Ferreira JP, Raimundo AM, Moreira H, Sardinha LB: Prevalence of the Portuguese Population Attaining Sufficient Physical Activity. Med Sci Sports Exerc. 2012, 44: 466-473. 10.1249/MSS.0b013e318230e441.
Google Scholar
Hansen BH, Kolle E, Dyrstad SM, Holme I, Anderssen SA: Accelerometer-determined physical activity in adults and older people. Med Sci Sports Exerc. 2012, 44: 266-272. 10.1249/MSS.0b013e31822cb354.
Google Scholar
Hagstromer M, Ainsworth BE, Kwak L, Bowles HR: A checklist for evaluating the methodological quality of validation studies on self-report instruments for physical activity and sedentary behavior. J Phys Act Health. 2012, 9 (Suppl 1): S29-S36.
Google Scholar
Sternfeld B, Goldman-Rosas L: A systematic approach to selecting an appropriate measure of self-reported physical activity or sedentary behavior. J Phys Act Health. 2012, 9 (Suppl 1): S19-S28.
Google Scholar
Masse LC, de Niet JE: Sources of validity evidence needed with self-report measures of physical activity. J Phys Act Health. 2012, 9 (Suppl 1): S44-S55.
Google Scholar
Kokkinos P, Myers J: Exercise and physical activity: clinical outcomes and applications. Circulation. 2010, 122: 1637-1648. 10.1161/CIRCULATIONAHA.110.948349.
Google Scholar
Ainsworth BE, Caspersen CJ, Matthews CE, Masse LC, Baranowski T, Zhu W: Recommendations to improve the accuracy of estimates of physical activity derived from self report. J Phys Act Health. 2012, 9 (Suppl 1): S76-S84.
Google Scholar
Freedson PS, Melanson E, Sirard J: Calibration of the Computer Science and Applications, Inc. accelerometer. Med Sci Sports Exerc. 1998, 30: 777-781. 10.1097/00005768-199805000-00021.
CAS Google Scholar
Hendelman D, Miller K, Bagget C, Debold E, Freedson P: Validity of accelerometry for the assessment of moderate intensity physical activity in the field. Med Sci Sports Exerc. 2000, 32: S442-S449. 10.1097/00005768-200009001-00002.
CAS Google Scholar
Swartz AM, Strath SJ, Bassett DR, O'Brien WL, King GA, Ainsworth BE: Estimation of energy expenditure using CSA accelerometers at hip and wrist sites. Med Sci Sports Exerc. 2000, 32: S450-S456. 10.1097/00005768-200009001-00003.
CAS Google Scholar
Pettee Gabriel KK, Morrow JR, Woolsey AL: Framework for physical activity as a complex and multidimensional behavior. J Phys Act Health. 2012, 9 (Suppl 1): S11-S18.
Google Scholar
Streiner DL, Norman GR: Health measurement scales : a practical guide to their development and use. 2006, Oxford University Press, Oxford; New York, 3
Google Scholar
Terwee CB, Bot SD, de Boer MR, van der Windt DA, Knol DL, Dekker J, Bouter LM, de Vet HC: Quality criteria were proposed for measurement properties of health status questionnaires. J Clin Epidemiol. 2007, 60: 34-42. 10.1016/j.jclinepi.2006.03.012.
Google Scholar
Bland JM, Altman DG: Agreed statistics: measurement method comparison. Anesthesiology. 2012, 116: 182-185. 10.1097/ALN.0b013e31823d7784.
Google Scholar
Vanhees L, Lefevre J, Philippaerts R, Martens M, Huygens W, Troosters T, Beunen G: How to assess physical activity? How to assess physical fitness?. Eur J Cardiovasc Prev Rehabil. 2005, 12: 102-114.
Google Scholar

Download references

Acknowledgments

Some of the questionnaires in this review have been made available to the authors and are available on the recently launched UK Medical Research Council Toolkit of Diet and Physical Activity Measurement [8].

Author information

Authors and Affiliations

Medical Research Council Epidemiology Unit, Cambridge, UK
Hendrik Hendrik JF Helmerhorst, Søren Brage, Herve Besson & Ulf Ekelund
Academic Medical Center, University of Amsterdam, Amsterdam, The Netherlands
Hendrik Hendrik JF Helmerhorst
Medical Research Council Human Nutrition Resource centre, Cambridge, UK
Janet Warren
Danone Baby Nutrition (Nutricia Ltd), Trowbridge, UK
Janet Warren
Department of Sports Medicine, Norwegian School of Sport Sciences, Oslo, Norway
Ulf Ekelund

Authors

Hendrik Hendrik JF Helmerhorst
View author publications
You can also search for this author in PubMed Google Scholar
Søren Brage
View author publications
You can also search for this author in PubMed Google Scholar
Janet Warren
View author publications
You can also search for this author in PubMed Google Scholar
Herve Besson
View author publications
You can also search for this author in PubMed Google Scholar
Ulf Ekelund
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ulf Ekelund.

Additional information

Competing interest

The authors declare they have no competing interest to declare.

Authors’ contribution

HH performed an updated literature search and drafted the manuscript. SB contributed to the design of the study and critically revised the manuscript. JW and HB contributed to the design of the study and performed the original literature search.UE contributed to the design of the study, contributed to the literature search and solved issues about inclusion of manuscripts, and critically revised the manuscript.All authors approved the final version of the manuscript.

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Rights and permissions

Open Access This article is published under license to BioMed Central Ltd. This is an Open Access article is distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Helmerhorst, H.H.J., Brage, S., Warren, J. et al. A systematic review of reliability and objective criterion-related validity of physical activity questionnaires. Int J Behav Nutr Phys Act 9, 103 (2012). https://doi.org/10.1186/1479-5868-9-103

Download citation

Received: 19 February 2012
Accepted: 15 August 2012
Published: 31 August 2012
DOI: https://doi.org/10.1186/1479-5868-9-103

A systematic review of reliability and objective criterion-related validity of physical activity questionnaires

Abstract

Background

Methods

Inclusion criteria

Exclusion criteria

Literature search

Data collection and extraction

Reliability

Validity

Classification

PAQs included

Results

New PAQs

Reliability

Youth

Adults

Elderly

Validity

Youth

Adults

Elderly

Existing PAQs

Reliability

Youth

Adults

Elderly

Validity

Youth

Adults

Elderly

Discussion

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Competing interest

Authors’ contribution

Authors’ original submitted files for images

Authors’ original file for figure 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

International Journal of Behavioral Nutrition and Physical Activity

Contact us