Direct and indirect measurement of physical activity in older adults: a systematic review of the literature

Background Due to physiological and cognitive changes that occur with aging, accurate physical activity (PA) measurement in older adults represents a unique challenge. The primary purpose of this study was to systematically review measures of PA and their use and appropriateness with older adults. A secondary aim was to determine the level of agreement between PA measures in older adults. Methods Literature was identified through electronic databases. Studies were eligible if they examined the correlation and/or agreement between at least 2 measures, either indirect and/or direct, of PA in older adults (> 65 years of age). Results Thirty-six studies met eligibility criteria. The indirect and direct measures of PA across the studies differed widely in their ability to address the key dimensions (i.e., frequency, intensity, time, type) of PA in older adults. The average correlation between indirect and direct measures was moderate (r=0.38). The correlation between indirect and other indirect measures (r=0.29) was weak, while correlations between direct measures with other direct measures were high (real world: r= 0.84; controlled settings: r=0.92). Agreement was strongest between direct PA measures with other direct measures in both real world and laboratory settings. While a clear trend regarding the agreement for mean differences between other PA measures (i.e., direct with indirect, indirect with indirect) did not emerge, there were only a limited number of studies that reported comparable units. Conclusions Despite the lack of a clear trend regarding the agreement between PA measures in older adults, the findings underscore the importance of valid, accurate and reliable measurement. To advance this field, researchers will need to approach the assessment of PA in older adults in a more standardized way (i.e., consistent reporting of results, consensus over cut-points and epoch lengths, using appropriate validation tools). Until then researchers should be cautious when choosing measures for PA that are appropriate for their research questions and when comparing PA levels across various studies.


Background
Older adults represent one of the fastest growing segments of our population. Worldwide, the proportion of adults aged 65 years and over was about 8 percent (521 million) in 2011 and it is anticipated that they could account for about 11 percent (939 million) of the total population by 2030 [1]. At this rate, it is anticipated that in the near future, the number of older adults will outnumber children for the first time in history [2,3]. Not only is the proportion of older adults increasing, but the average life expectancy also continues to climb [2,3].
Current physical activity (PA) guidelines encourage older adults to engage in at least 150 minutes of moderate-to vigorous-intensity aerobic PA per week and to engage in muscle and bone strengthening activities at least 2 days per week [4,5]. Despite these recommendations and the many well-known benefits of PA, studies have also demonstrated that the vast majority of older adults are physically inactive and that the prevalence of inactivity increases with advancing age [6][7][8]. With the growth in the number and the life expectancy of older adults, the numerous risks (e.g., disability, chronic disease, reduced functional abilities, increased falls) [9][10][11][12][13] associated with the prevalence of inactivity in older age have the potential to be an enormous burden not only to the older adult, but also to society as a whole.
Interventions aimed at improving levels of PA (i.e., "any bodily movement produced by the skeletal muscles that results in energy expenditure") p. 126, [14] in older adults can have far reaching impacts on the aging population. Developing and evaluating interventions to meet this aim requires reliable, valid, cost-effective, practical and unobtrusive means of measuring PA. However, measurement of PA, especially in older adults, is, unfortunately, fraught with challenges. For example, changes in cognitive abilities and memory may lead to difficulties understanding instructions on self-report measures and challenges recalling PA behaviours, especially over longer periods of recall. Aging and disability changes the metabolic costs of activities, so standard tables and equations used for determining energy expenditure of activities that have been developed on younger populations may be inappropriate for older adults [12,[15][16][17][18][19]. Existing indirect and direct measures of PA differ in their intended purpose, appropriateness for different populations and ability to assess the key dimensions of PA (frequency, intensity, time, and type (FITT) in older adults.

Indirect PA measurement
Indirect measures rely on self-report [15,[20][21][22] and are practical, easy to administer to large groups, and costefficient. They are also generally well accepted, place relatively low burden on and interfere little with the usual habits of the individual. However, they are prone to either over or under-estimation due to inaccurate recall, social desirability and misinterpretation [15,18,23]. In addition, many existing indirect tools fail to measure the lower end of the PA continuum [24] and are susceptible to fluctuations in health status, medical conditions and medications, fatigue, pain, concentration and distractibility, changes in mood, depression, and anxiety, and problems with memory and cognition [12,19,25].

Direct PA measurement
Direct measures of PA assess energy expenditure [15] or actual movement [26] and are generally considered more accurate, are not prone to response and recall biases [15,20,22,27] and are often used to validate indirect measures of PA [20,22]. However, typically direct measures are more expensive, intrusive, time-consuming, and place a higher degree of burden on both the participant and the researcher than indirect measures [20,22]. Also, individuals may alter their behavior because they know it is being measured [15]. Some measures (e.g., accelerometers, pedometers) provide very limited information about type of activity [19] and are not suitable for measuring certain types of PA (e.g., swimming, resistance exercise, upper body movements, cycling, complex movements; [15,21]). Although direct measures do not rely on selfreport, there is a subjective element in data analysis and interpretation (i.e., the researcher chooses epoch lengths, cut points/thresholds for intensity groupings).

Study purpose
Recent systematic reviews have compared direct and indirect measures of PA in adult [22] and pediatric [20] populations and found low to moderate correlations and poor agreement between direct and indirect measures of PA. Although assessment of PA in older adults represents a unique challenge, to the best of the author's knowledge, a similar review of PA measures in older adults has not been conducted. Thus, the primary purpose of this paper is to provide a systematic review and critique of direct and indirect measures of PA and their use and appropriateness with older adults. To reach this end, tools were evaluated on their ability to assess the key dimensions of PA in older adults. The association and agreement between indirect and direct PA measures were also examined. Secondary objectives of this paper were to determine the relationship and agreement between: a) indirect measures with other indirect measures; and b) direct measures with other direct measures of PA in older adults.

Search strategy and selection
Literature searches of direct and indirect PA measures in older adults were conducted using ISI Web of knowledge, AgeLine, PsychINFO, Medline, and SPORT Discus (See Additional file 1. Search Strategy). The search strategy was developed by two of the authors (KK and RR) and was based on systematic reviews comparing direct and indirect measures of PA in adult [22] and pediatric populations [20]. Combinations of the following key terms were used to search the above databases: PA level terms (PA, exercise); older adult terms (older adults, aging, aged, seniors, elders, elderly, 65 years and over); general measurement terms (measures, measurement, instruments, tools, tests, assessment, testing); indirect measurement terms (indirect, subjective, self-report, diaries, logs, questionnaires, surveys, interviews); and direct measurement terms: (direct, objective, physical, doubly labeled water, indirect/direct calorimetry, accelerometry, pedometry, heart rate monitoring, GPS, direct observation).
Selected studies were peer reviewed journal articles examining the agreement between at least two measures, either indirect or direct, of PA in adults over 65 years. Studies were excluded if they: (1) did not compare at least two measures of PA, (2) involved a target population that included any participants less than 65 years of age, (3) were not written in English, or (4) were dissertations or conference presentations. Eligible direct measures included pedometry, accelerometry, heart rate monitoring, direct and indirect calorimetry, doubly labeled water, and direct observation. Eligible indirect measures included questionnaires, surveys, interviews, and activity records/ logs/diaries. Other report (i.e., having significant others report on the individual's physical activity) was considered an ineligible measure due to the possible heterogeneity of reporters (e.g., spouse, personal trainer, sibling, children, caregiver).

Screening
The primary author initially screened identified studies based on the study title and abstract. Duplicates, articles that were not published in English, and irrelevant studies were manually removed. Potential studies were briefly scanned to see if they met eligibility criteria. Manual cross-referencing of bibliographies of the selected articles was also completed (See Figure 1. Screening Procedures).

Quality assessment
The quality of studies was assessed using a recently developed checklist for evaluating the validity and suitability of existing activity and sedentary behavior instruments [28]. The checklist is an adaptation of the checklist created by Downs & Black [29]. This checklist includes additional criteria for questionnaire design and PA measurement and includes nine quality of reporting criteria, three external  validity criteria, and ten internal validity criteria. Two reviewers, including one of the authors (KK), independently rated the quality of the individual papers and consensus was achieved through discussion.

Data extraction and synthesis
Each study was reviewed and information about type of study design, sample characteristics, sample size (total, men, women), direct measures, units of measurement for direct measure, duration of direct measurement, indirect measures of PA, units of measurement for indirect measurement, length of recall, and time between each PA measure, correlations, and mean differences were extracted by the one author (KK) and a second author checked the data (RR). If level of agreement was not reported in the reviewed articles, but the units of measurement were comparable percent mean difference (indirect meandirect mean/direct mean × 100) or absolute mean differences (direct mean direct mean or indirect meanindirect mean) were calculated. Units were converted to comparable units whenever possible. In many cases, units across the various measures of PA were not comparable; therefore, it was not possible to examine agreement between these PA instruments. As such, average correlations between: 1) direct and indirect measures; 2) indirect and indirect measures, and 3) direct and direct measures were also computed and adjusted/weighted by sample size. Since the correlation coefficient is not a linear function of the magnitude of the relation between two functions, and cannot simply be 'averaged' , all correlations (Spearman rank coefficient, Pearson correlations coefficients) were first converted to Fisher's Z, means and 95% confidence intervals (CIs) were computed, and then transformed back to correlation coefficients [30] using Comprehensive Meta-Analysis [31].

Description of study samples
Participants in eligible studies ranged from 65 to 99 years of age. Sample sizes ranged from eight [61] to more than 5000 [54]. The latter study reported on the results of a subset of questions from an annual US survey but the total sample of older adults was not reported. The majority of studies reported on combined samples of both genders, one reported on men only [32], four on women only [33,34,43,52] and in one study gender was unclear [61].

Quality of studies
The quality of all included studies was assessed (n=36) using the tool described above. Scores on this tool ranged from 9 to 19 out of a maximum score of 22 points. The mean score on the tool was 14.7 (2.3). Of the 36 studies, 29 were considered modest quality (score of 6 to 16) and 7 were considered high quality (17 to 22). None were categorized as poor (0 to 5). Scores on the reporting criteria ranged from 3 to 9 (maximum 9 points) with a mean of 6.4 (1.3). The mean score on the external validity scale (out of 3) was 1.3 (0.5). The internal validity of the reviewed studies ranged from 2 to 10 with a mean score of 6.9 (1.6). For more information about the quality ratings see Additional file 2.

Data synthesis
Brief overview of the indirect measures and their assessment of PA dimensions in older adults Thirty-two different indirect measures were used to assess PA in the identified studies. These can be divided into two broad groups of PA questionnaires (i.e., self-or interview administered questionnaires/surveys) or activity logs (i.e., records kept for a specified timeframe) [21,23,66]. The most frequently used self-report measures were the Physical Activity Scale for the Elderly (PASE; n=8, including 1 translation [16,26,32,33,37,46,47,50]), the Community Healthy Activities Model Program for Seniors Activities Questionnaire for Older Adults (CHAMPS; n=4 [16,35,42,53]), and activity diaries/logs (n=5 [34,41,45,47,48]).
Self-report measures were classified by type according to the system described by Neilson and colleagues [67]. In this system, PA questionnaires (PAQs) that derive a score and contain less than 10 items are classified as global, PAQs that derive scores, activity duration, or estimate energy expenditure and contain 10-20 items are classified as recall questionnaires, and PAQs that derive an estimate of energy expenditure and contain more than 20 items are classified as quantitative. Of the 32 self-report measures identified in this review, 8 were classified as quantitative, 10 as recall, and 9 as global. Five additional questionnaires could not be located and were not evaluated (i.e., Physical Activity Index [33], Japan Arteriosclerosis Longitudinal Study Physical Activity Questionnaire [37], the Modified Dallaso [32], Older Adult Exercise Status Inventory [52], and a unspecified global PA item [47]).
As can be seen from Table 1, the self-report methods in this review varied in their ability to address the four PA dimensions. A majority of measures (21/27) asked about frequency of activities, but some only asked about the frequency of a limited number of the total list of activities evaluated in the measure. Although the scoring systems of thirteen measures involved assigning intensity codes or metabolic equivalents to activities endorsed by the older adult, only 6 measures required participants to rate the intensity of their activities (e.g., pace of walking, rate of exertion, rating scales). Most measures (22/27) evaluated at least one major type of PA (leisure, household, occupational), three categorized PA by intensity (e.g., light PA, moderate PA, vigorous PA), and two measures did not measure type of activity. Several of the tools (i.e., the CHAMPS, YPAS, PASE, Modified Baecke, the Phone FITT, the Physical Activity Questionnaire for the Elderly Japanese, the Questionnaire D' Activité Physique Saint Etienne, and the Older Adult Exercise Status Inventory, LASA Physical Activity Questionnaire) were designed specifically for older adults and address physical activities, including lower intensity activities, in which older adults are more likely to engage. All but seven measures asked about duration of activity, either in hours/week, hours or minutes/day or minutes/occasion. A substantial portion of the measures evaluated in this review asked about duration of activity per occasion, total volume across the day, or assessed duration on ratings scales.

Brief overview of the direct measures and their assessment of PA dimensions in older adults
Six different types of direct measures were employed, including accelerometers, pedometers, doubly labeled water, indirect calorimetry, heart rate monitoring, and direct observation. Detailed description of these instruments is outside of the scope of this review. For more information about these direct measures of PA instruments please refer existing review chapters and websites on PA measurement e.g., [15,21,23,66]. In the reviewed studies, accelerometry (21 studies) and pedometery (11 studies) were the most frequently used measures, while doubly labeled water (3 studies) and heart rate monitoring (2 studies) were the least frequently used direct measures. Accelerometry, pedometry, indirect calorimetry and heart rate monitor allow for quantifying intensity of exercise, but in very different ways (counts/min, oxygen consumption, changes in heart rate). As can be seen in Table 2, accelerometry, indirect calorimetry, heart rate monitoring and direct observation all provide some information about frequency and duration of PA, although to varying degrees.
Four of the six measures permit calculation of type of activity by intensity, but provide very little or no information about the major types of PA [19]. For example, accelerometers cannot capture information about activities where there is no change in acceleration or that involve water (e.g., swimming, upper body movements, cycling [15,21]). Direct observation allows for detailed accounts of type, time and intensity of PA and is highly time consuming and places a lot of burden on the assessor. Pedometry is limited to monitoring acceleration/ deceleration in the vertical plane. Doubly labeled water permits measurement of energy expenditure, as do accelerometry, indirect calorimetry, and heart rate monitoring, but provides no information about frequency, intensity, type or duration of PA. In addition, direct observation provides a means of looking at important variables that influence PA behaviors, including presence of others, behavioral cues, and barriers to PA in older adults.

Agreement between indirect and direct measures
The results of studies of direct and indirect PA measures containing comparable units (n=6) are summarized in Table 3. For more information about the key characteristics of these studies see Additional file 3. Three of these studies examined the agreement between energy expenditure obtained from self-report measures and from doubly labeled water. In these studies, daily energy expenditure from self report both under and overestimated energy expenditure from doubly labeled water with values ranging from −14% to 37% [32,41,44]. Likewise PA energy expenditure was both under and overestimated by self-report compared to doubly labeled water with values ranging from −39 to 11 percent [32]. In two of the studies, differences between heart rate monitoring and self-report PA measures ranged from −14 to 6% percent. Last, PA from self-report both underestimated and  0.37 to 0.42) were also moderate, while the average correlation in samples of women only was weak (r = 0.252, 95% CI of 0.19 to 0.31).

Agreement between indirect measures
As can be seen in Table 3, agreement between indirect PA measures varied considerably across the PA constructs (e.g., time, energy expenditure) and measures. For more information about the key characteristics of these studies see Additional file 4. For the purposes of comparison, energy expenditure scores were all converted to kcal/week and duration scores were converted to hours/week. Across studies reviewed in Table 3, absolute differences in agreement between different indirect measures of energy expenditure from PA varied from as low as a difference of 504 kcal/week between the YPAS and College Alumni Questionnaire [32] to as high as a difference of 7931 kcal/week between the YPAS and CHAMPS [53]. In both studies comparing the YPAS to the CHAMPS, the YPAS produced higher estimates of energy expenditure and of time spent in physical activity [16,53]. In the reviewed studies, differences between measures of time spent in physical activity varied from 0.4 hours per day to 21.7 hours per week [48,53].

Association between indirect measures
In contrast to the limited studies evaluating agreement between indirect PA measures, many studies looked at the association of indirect measures with other indirect measures. Correlations between indirect measures of total levels of PA were in the weak to high range (0.15 to 0.85). When all studies (r=0.29, 95% CI = 0.28 to 0.30), and studies including mixed samples of men and women (r = 0.28, 95% CI = 0.27 to 0.29) were considered average correlations between indirect measures of total PA were weak. When correlations in samples of women only were considered, the average correlation was moderate (r = 0.46, 95% CI = 0.41 to 0.50).

Agreement between direct measures
The findings regarding agreement between direct measures are grouped into those that measured PA levels in the real world (n=5; Table 3) and those that measured PA in controlled or laboratory settings (n= 10; Table 4). T: total volume of activity (min); time spent in activities above a predetermined intensity threshold level Other: energy expenditure can be calculated from calibration equation Pedometry I: Step counts per unit time.     Study also examined step counts in younger adults (N=17). Only results specific to the older adult sample (N=28) are presented.
For more information about the key characteristics of these studies see Additional file 5. Among the 2 studies examining real world patterns of PA using pedometers, differences in daily step count from pedometers worn simultaneously over 7 days varied considerably ranging from 1562 steps per day to 5385 steps per day [55,56]. In contrast, step count differences between pedometers and accelerometers worn simultaneously over 7 days were very similar ranging from 27 steps per day for women to 60 steps per day for men in another study [38]. Compared to group calibrated heart rate monitoring [43] and doubly labeled water [41], individually calibrated heart rate monitoring provided higher estimates of daily expenditure in older adults. Among the measures looking at the agreement between direct measures in controlled situations (i.e., treadmill tests, step tests, walking fixed distances), the most common comparisons were between pedometers or accelerometer step counts and observed step counts (manually or camera recorded step counts). With self-paced walking, pedometers (Accusplit Eagle 120 mechanical pedometer, NL-2000 electronic pedometer, the Step Activity Monitor, YAMAX DigiWalker) generally underestimated observed step counts, with percent agreement varying from −13% to +2% [63][64][65]. Likewise, accelerometers tended to underestimate actual observed step counts with percent agreement ranging from −7 to −3% [59,63,65]. In studies where speed of walking was considered, accuracy of pedometers and accelerometers tended to decrease as walking speed decreased [57,59,65]. Speed of walking was manipulated either by changing treadmill speed, having participants walk at self-selected speeds (slow, normal, fast) during a set distance, or by dividing participants into groups based on their gait speeds. In particular, the ActivPal, an accelerometer, stood out because it measured total steps and steps per min with a high degree of accuracy (i.e., errors less than 1% for both treadmill walking and walking outside; [59]).
Two measures examined the agreement between indirect calorimetry and accelerometry in estimating energy expenditure during exercise (treadmill, stepping test) [58,62]. Accelerometers tended to underestimate expended energy with estimates ranging from −2 up to −60%. However, this was not a uniform finding; accelerometers both overestimated (10-52%) and underestimated (−12% to −60%) energy expenditure from indirect calorimetry in one study [58] and underestimated energy expenditure (−2%) in the other [62].

Association between direct measures
In most cases, studies comparing direct PA measures examined agreement rather than correlation. Of the studies looking at real world PA behavior, three studies reported correlational analyses between direct methods of measuring PA (i.e., pedometry with pedometry, accelerometer with pedometry, individually calibrated heart rate monitoring with group calibrated heart rate monitoring) with correlations ranging from 0.37 to 0.97 (r= 0.84, 95% CI = 0.81 to 0.87) [38,43,55].
The remaining studies also involved mixed samples of men and women and compared direct measures of PA in a laboratory setting with correlations varying considerably from weak (r =−0.28) for steps counted by direct observation and pedometry [56] to strong (r= 0.98 and r=0.99) for steps counted by direct observation and pedometry [56,63] and steps counted by direct observation and accelerometry [64]. The average correlation between direct PA measures in a laboratory setting, regardless of direct measure employed, was high (r=0.92, 95% CI = 0.90 to 0.94).

Discussion
Reliable and valid assessment of PA in older adults is an important area of research. The quality of existing studies examining measurement of physical activity in older adults was moderate. Although the quality of the articles published on this topic was generally moderate and none of the studies were of poor quality, only 7 of the 36 studies were classified as high quality. These findings are informative but they need to be considered with some caution due to quality limitations of the studies at present. Without higher quality studies, significant gaps in our knowledge and understanding of PA measurement in older adults will remain. Higher quality research is needed to get a clearer picture of patterns of PA, to design interventions to promote PA, and to monitor changes in patterns of PA in older adults. To do so, researchers need to select valid measurement tools, and use stronger more consistent research methodology and superior reporting of results. Although systematic reviews of direct and indirect PA measurement tools have been conducted in adult and pediatric populations [20,22], to the best of the authors' knowledge, this represents the first comprehensive attempt to: 1) evaluate the ability of PA measures to assess the dimensions of PA, and 2) assess the association and agreement between PA measures (i.e., direct with indirect, indirect the direct, and direct with direct), specifically in older adult populations.

Indirect measures
The indirect measures that were reviewed differed widely in their ability to address the key PA dimensions in older adults. While self-report measures, including the more detailed PA questionnaires and activity logs, can be an excellent source of information of the dimensions of PA (especially frequency, time, and type of activity) in older adults, key limitations with respect to their  use in older adults were identified in the selected studies. For instance, the high prevalence of assigning metabolic equivalents to activities in the reviewed studies is problematic considering standard tables developed with younger populations tend to overestimate the intensity of PA in older adults populations [17]. Age neutral measures were sometimes used in the selected studies. These questionnaires tend not to include the types of activities in which older adults typically participate [47,48]. Walking is the most common activity in which older adults participate [64], so those measures that specifically address walking intensity are of use with this population. Older adults generally tend to participate in lower intensity exercise more often than moderate and vigorous PA [25] and their PA participation tends to be intermittent, sporadic or unstructured making its recall more challenging [15,16]. Measures that permit the assessment of whether activity occurs in short bouts of activity or a single occasion is an important detail about the frequency and duration of activity [17]. Few of the self-report questionnaires examined in the selected studies asked participants to rate their own perceived intensity of activities. Perceived intensity differs depending on a person's age and fitness level. An important consideration is that older adults, especially inactive ones, may perceive activities typically classified as light intensity, as more demanding than younger, more fit individuals.

Direct measures
Direct measures of PA are generally considered to be more valid measures of PA than indirect measures. Like the reviewed self-report measures, the direct measures in this review varied in their ability to capture the key dimensions of PA. In particular, accelerometry and pedometry, the most frequently used direct measures in this review, are limited in their ability to capture type of activity. The direct measures in the selected studies were generally limited to the assessment of type of activity by intensity. While this PA dimension is very useful for addressing questions related to dose response to PA, it provides very limited information about the patterns of activity of older adults. Accelerometry, indirect calorimetry, and heart rate monitoring allow for evaluation of bouts of continuous activity above a predetermined intensity threshold, as well as total time above predetermined thresholds. These tools can provide a picture of the shorter and sporadic forms of activity in which older adults may participate. However, accelerometry, the least invasive and time intensive of the three types of measurement, is known to be less accurate at detecting PA at lower intensities, so bouts of continuous low intensity activity may be missed. Doubly labeled water, accelerometry, indirect calorimetry, and heart rate monitoring provide estimates of energy expenditure, however, there is a debate in the literature about the appropriateness of the calculations to estimate energy expenditure, especially for older adults populations [17,24,68]. A major limitation identified in the studies of direct PA measures was the methodological inconsistencies across studies (epoch lengths, cut points). Decisions about cutpoints for classifying intensity levels, as well as selection of epoch lengths varied considerably across studies of accelerometers in this review. Moreover, within a single PA measurement tool, such as accelerometry, PA can be quantified in very different ways. For instance, the reviewed studies varied considerably in epoch lengths and cut-points for determination of PA intensity classifications and provided limited rationale provided for their choices. Although work has been done examining epoch lengths on estimates of physical activity in children, little research has been conducted with adults and even less with older adults. As has been found in recent work with postmenopausal women, it seems appropriate that shorter epoch lengths (e.g., 10 seconds) will derive more accurate estimates of physical activity in populations of older adults than longer epoch lengths (e.g., 1 min); however, the same study also found that relations of physical activity to most health outcomes did not vary by epoch lengths [69]. Moreover, national PA surveys with adults (e.g., National Health and Nutrition Survey, the Canadian Health Measures Survey) have generally used 1 minute epoch lengths [70,71]. Some work has developed cut-points of PA classification specific to older adults [72]; however, cut points that are not age specific are often used in research with this population. Decisions regarding cut-points can have dramatic effects on data interpretation, and the resulting PA classification levels (i.e., over or under-estimation of minutes of moderate to vigorous PA [73] and relationships between PA and various outcomes (e.g., health, cognition) [74]. Future research will need to establish age specific cut-points for older adults. Direct PA measurement in older adults is also complicated by the prevalence of slower walking speeds and gait disorders [15]. As found in this review, motion sensors are less accurate at slower speeds and this is a likely issue with older adult populations. Several tools identified in this review have been designed for and/or are appropriate for individuals with varying gait patterns and walking speeds [56,59,63]. An additional problem with objective measures of PA in older adults is low compliance with measurement protocols (e.g., problems with memory, lacking the visual and manual dexterity to put the device on properly and to activate it, confusion with using unfamiliar new technology [12,25,75]). Many of the selected studies either did not examine or did not report on compliance levels with direct measurement protocols. Studies in this area should address this factor as valid and reliable PA measures are of limited utility if older adults will not comply with the measurement. Thus, although the reviewed tools provide useful information about the dimensions of PA in older adults, there are many issues specific to older adults that make assessment of PA a unique challenge. These issues require further examination.

Agreement and association between measures
Additional objectives of this study were to assess the agreement and association between: 1) indirect and direct measures, 2) indirect and other indirect measures, and 3) direct and other direct measures. Unfortunately, a clear pattern regarding the agreement of measures in these three groups did not emerge. Inconsistency in the type of results reported and the lack of comparable data in studies comparing 1) indirect and direct measures and 2) indirect measures with other indirect measures precluded the evaluation of percent agreement (or absolute difference) in the majority of instances. Findings regarding agreement between direct measures with other direct measures were mixed, with some measures yielding high levels of agreement and others not. Most studies of agreement between direct measures in this review examined accelerometers, pedometers, and direct observation. The limited scope of current research examining agreement between PA measures makes it difficult to compare across studies and to generalize results.
Studies of older adults relied primarily on correlational analyses to compare and to validate measures of PA in older adults. Similar to the systematic reviews of Adamo and colleagues [20] and Prince and colleagues [22], weak to moderate correlations were generally found between indirect and direct measures of PA in older adult populations. Likewise, the strength of the association between indirect measures with other indirect measures was generally weak, while it was the associations between direct measures and other direct PA measures, regardless of setting (i.e., real world, laboratory setting) were high. As has been noted by others, correlation provides information about the strength of the relationship and does not reflect agreement [22,76]. We must be cautious in relying solely on correlation as justification for choosing one measure over another. Moderately or even highly correlated measures may be measuring entirely different PA constructs. The results from this review provide limited information about the agreement across PA measures and minimal information to help guide researchers in their choice of PA measure. Researchers are advised to use the Quality Assessment of Physical Activity Questionnaire (QAPAQ) Checklist [77] to help researchers in their choice of PA self-report measures [77].

Take home message
Despite the lack of clear trend regarding the agreement of PA measures in older adults, the findings provide useful information about research needs in this important field. Consistent with papers on pediatric and adult population agreement and correlations between measures were weak to moderate. The measurement of PA is complex across all populations and as one can see from this review, the measurement of PA involves additional unique challenges in older adults. Not only does PA involve a number of separate dimensions, but it also is not a static behavior. We cannot assume that physical activities are performed at the same intensity across person or time. There is considerable inter-individual variability (e.g., difference in perceived intensities of exercise, types of activities engaged in, and in the costs of PA) and intra-individual variability (e.g., disease states, changes in activity levels due to changes in health or demands on time). Moreover, the accuracy of our instruments also contributes to the weak to moderate correlations and agreement between measures.
Choosing the appropriate tool to measure PA in older adults is also complex. PA levels and patterns are more a reflection of biological age than chronological age. However, as Shephard [24] points out an agreed upon method of determining biological age does not exist. In healthy, active older adults assessment methods appropriate for younger populations may be quite appropriate. Based on the complexity of PA and its measurement, we cannot really expect to adequately capture all of its dimensions with a single measure. The question is not as simple as which measure is most appropriate for older adults, but rather what combination of tools is the most appropriate. The choice in tools depends not only on the specific population of older adults, but the intended purpose of the evaluation.

Future directions
With regards to the appropriateness of PA measures in older adults, qualitative work is needed to help gain a better understanding of how older adults feel about these methods of measurement and the level of burden that is being place on them. Moreover, qualitative work can be used to help design new measures that address PA constructs specific to older adults or to improve on existing measures. Further work is needed to develop standard metabolic cost tables and equations that are specific to older adults. Questionnaire items should also be carefully developed to address not only the types of activities in which older adults participate and the generally more sporadic nature of older adult's activities, but should also allow older adult to report on their perceived intensity during activity to allow for exploration of this important aspect of their PA. With regards to direct PA measures, further work is needed to address methodological inconsistencies (cut-points, epoch lengths), especially in older adults where limited research has been conducted. Achieving consensus within the research community in this regard is an important research goal. Assessment of factors such as health status, medical conditions and medications, changes in mood, depression, and anxiety, and fatigue, pain, and concentration and distractibility [10,22,23] that may influence PA measurement is important. Continuing to assess the agreement between PA measures in older adults is another important research target. If measures are found to show high agreement on the PA construct of interest, than the briefer, more feasible methods can be selected. To advance this field, PA researchers need to approach the assessment of PA in older adults in a standardized way. We cannot sufficiently assess agreement between measures unless researchers report the required results, reduce methodological inconsistencies (e.g., lack of consensus regarding cut-points, epoch lengths) and choose appropriate tools against which to validate their measures of PA (i.e., must be evaluating the same construct).