Validity of the overall IPAQ-SF: overall physical activity level
These data are presented in Table 2. The IPAQ-SF showed negligible to small correlations in total activity level with objective measuring devices (range of ρ = 0.09 [19] to 0.39 [36], median = 0.29). Among the 18 correlations reported for objective measuring devices [17 - 20, 23, three reported in 25, 29, 30, two reported in 31, 32 - 35, 39], 16 of them were regarded as small and the others were negligible. In general, the correlation of the IPAQ-SF with accelerometer data (range of ρ = 0.09 [19] to 0.39 [36], median = 0.28) was the same with that of the pedometer (range of ρ = 0.25 [25] to 0.33 [20], median = 0.28) and actometer (ρ = 0.33 [18]).
With fitness measures (VO2max, maximum treadmill time, and 6-minute walk test reported in the lower section of Table 2), the correlations with the IPAQ-SF total activity level were small in four of the five studies (range of ρ = 0.16 [33] to 0.36 [37], median = 0.30). Only one study validated the IPAQ-SF against anthropometric measures, which reported a small correlation between the IPAQ-SF and body fat percentage (ρ = -0.19 [44], not shown in any tables).
In the only study using doubly labeled water as the criterion measure [28], the validity of the IPAQ-SF was assessed by categorizing participants into insufficiently active, sufficiently active, and highly active based on their IPAQ-SF scores (Table 3). The total energy expenditure (TEE) and physical activity level (PAL) (both measured using doubly labeled water) were then compared across the three categories. TEE and PAL in the highly active participants were significantly higher than that of the other two groups, and the authors concluded that highly active participants could be correctly identified, and distinguished from inactive participants using the IPAQ-SF, but other discrimination was poor [28].
Validity of the IPAQ-SF: specific levels of intensity
These data are presented in Table 3. Three studies [20, 38, 43] reported moderate to large correlations (ρ ≥0.5) for one of the different levels of intensity (vigorous activity, moderate activity, and walking) (superscript a in column 4-6 of Table 3). Of the four correlations [20, 38, two reported in 43] in the moderate range or higher (ρ ≥ 0.5), three [20, two reported in 43] were correlations related to walking time and the remaining one [38] related to moderate activity. All the above four correlated IPAQ-SF against accelerometer or pedometer values [20, 38, two reported in 43]. In addition, two studies [36, 43] reported values in the 0.40 to 0.49 range for time spent on walking and accelerometer count. Time spent on walking seemed to correlate best with accelerometer/pedometer counts.
Of the five remaining studies [25, 34, 36, 37, 43] (superscript b in column 4-6 of Table 3) reporting correlations approaching the moderate level (ρ = 0.40 - 0.49), all measured activity at the vigorous level; two were correlations between vigorous activity time and fitness measures (VO2max [34] and maximum treadmill time [37]), and the other three were for vigorous time spent measured against accelerometer data [25, 36, 43]. As the correlation for validation against fitness measures is recommended as ρ = 0.40, there was some support for the validity of the IPAQ-SF in measuring vigorous activity. However, it should be noted that these represent only a third of the correlations reported against the fitness measures.
Accuracy of the IPAQ-SF
Table 4 shows the accuracy of the IPAQ-SF. Six studies provided the amount in physical activity measured by the IPAQ-SF and objective data [19, 25, 31, 35, 36, 42], but surprisingly, none of them computed the percentage of over- or under-reporting of physical activity, or used the absolute difference as an indicator of validity. Furthermore, standard deviations were not provided by these studies, making it impossible to compute the effect size for the differences between the IPAQ-SF and the objective device. Under-reporting of physical activity (-28%) was present in only one study [31], but in the other five studies [19, 25, 35, 36, 42], over-reporting by the IPAQ-SF of 106 percent on average when compared to the accelerometer was found (range 36 - 173%).
Factors that might relate to variability of validity findings
Demographics
None of the demographic characteristics, including place of study, targeted population, sample size, male-female ratio, and age, seemed to be related to differences in validity between the IPAQ-SF and the criterion measure (Tables 1 and 2).
Objective standard used for validation
Fifteen studies used an objective device that monitored body motion [17–20, 25, 29–32, 35, 38–40, 42, 43], two examined scores against a physical fitness measure [37, 41], four used both an objective device and a physical fitness measure [23, 33, 34, 36] and one compared findings against anthropometric measures [44] (Tables 2 and 3). Of those reporting data from motion-sensing devices, one of them used the actometer, two used a pedometer, and fifteen used an accelerometer. Two of them used both a pedometer and an accelerometer. Notably, only one study used doubly labeled water [28] (Table 3), the recommended criterion for validation [8, 22] to assess the validity of the IPAQ-SF.
Indices from objective standards used for validation
The third columns of Tables 2 and 3 indicate the unit used in the analyses. For the accelerometer device (excluding pedometers), and for the fitness measures, several different units were used and were not consistent across studies. Of the seventeen studies using an accelerometer as the objective standard (8 in Table 2 [18–20, 29, 31–33, 39], 4 in Table 3 [38, 40, 42, 43], and 5 in both [23, 25, 34–36]), four types of units were commonly reported (with some studies reporting multiple different units). These included (i) raw accelerometry counts without transformation (Counts [17, 25, 29, 31, 33, 35, 36, 40, 43]), (ii) count data to energy expenditure (TEE/AEE/PAL [23, 34, 39]), (iii) MET scores (MET min/wk [19, 25, 31, 32, 36, 38, 40, 42]), and (iv) time spent (Total PA min/wk [25, 31, 36, 38–40, 42, 43]). In addition to the variability of units used for reporting accelerometer data, there was also a great variability in the cutoffs used to transform the accelerometer data into MET min/wk. Three different cutoffs (Freedson [26], Swartz [27], and Trost [46]) were used among the aforementioned validation studies, yet overall, no pattern of difference in correlations was evident based on the use of the different cutoffs.
Nevertheless, this was not the case for the absolute discrepancy between the IPAQ-SF and the accelerometer scores (reported in Table 4). The only study using the Swartz cutoffs ([27], moderate PA: 574≤ count/min≤4945, vigorous PA: count/min > 4945) yielded an over-report of 36%, which appears relatively small compared with the average of 95% for the four studies [19, 25, 31, 42] using the Freedson cutoffs (moderate PA: 1952≤ count/min≤5724, vigorous PA: count/min > 5724) (Table 4). In theory, the Swartz cutoffs will yield a lower MET score than the Freedson cutoffs, because some of the time spent on moderate activity classified by the Swartz cutoffs (574≤ count/min < 1952) may be classified as inactive by the Freedson cutoffs, so that total time spent computed using the Swartz cutoffs will be higher than that using the Freedson cutoffs. Note that it is impossible to conclude that the Swartz's cutoffs are more appropriate simply because they reduce the over-report of the IPAQ-SF, as the true level of physical activity is not known. As the Trost's cutoffs depend on the age of the participants, no direct comparison to the other two cutoffs can be made. It is of interest that no published study has yet compared IPAQ-SF with the more recent weighted-accelerometer cutoffs suggested by Metzger et al [47].
Indices from the IPAQ-SF
Values obtained from the IPAQ-SF have also been used in different ways in the various studies. Of the sixteen studies that computed the total physical activity from the IPAQ-SF (Table 2), six [25, 29, 30, 32, 33, 37] used total time spent (Total PA min/wk), nine [17–20, 31, 34–36, 39] transformed the total time spent to MET scores (MET min/wk), and one [23] used a novel trichotomized variable indicating the adequacy of physical activity (3 categories). Again, no pattern across the correlations was evident based on the use of these different indices.
Other potential moderators
Two studies aimed at finding potential factors influencing the validity of the IPAQ-SF. One group studied the relationship between the participant's confidence in accurately recalling physical activity on the IPAQ-SF [40], whilst the second group examined whether keeping physical activity logbooks improved the validity of the IPAQ-SF report [42]. The resultant correlations ranged from 0.15 to 0.30, whilst the confidence ratings and the act of completing daily logbooks did not influence the relationship between the IPAQ-SF and the objective measures. Although logbooks did not improve IPAQ-SF validity, one IPAQ-SF validation paper written in Chinese [48] showed that using a logbook to impute missing accelerometer data could yield an acceptable IPAQ-SF validity (Pearson correlation = 0.63, not shown in tables).