With changing social and economic patterns all over the world, sedentary lifestyles have become a worldwide phenomenon [1, 2]. Sedentary lifestyles are associated with increased obesity, type 2 diabetes , and cardiovascular disease , and hence the promotion of active lifestyles is an important public health priority. To monitor trends and evaluate public health or individual interventions aiming at increasing levels of physical activity, reliable and valid measures of habitual physical activity are essential. Several routine instruments are available to measure physical activity, including self-report questionnaires, indirect calorimetry, direct observation, heart rate telemetry, and movement sensors . All of these methods have well-known limitations , and for physical activity there is currently no perfect gold-standard criterion [7, 8]. Movement sensors such as accelerometers have grown in popularity recently as a measure of physical activity , not only due to their objective measurements, but also due to their relatively small and unobtrusive size. Nevertheless, due to their high costs, accelerometers are not usually practical in large-scale cohort studies and instead questionnaires are frequently used to obtain physical activity data [10, 11].
There are numerous available choices for questionnaires measuring physical activity . Recent reviews have documented 85 self-administered physical activity questionnaires for adults , 61 for youth , and 13 for the elderly . Many of these questionnaires have study-specific items and time referents, severely limiting the potential for comparisons across different studies. For example, the Synchronized Nutrition and Activity Program  measures activity relevant only to primary school children, and contains items that are not common across broad sectors of the population. The International Physical Activity Questionnaire (IPAQ) was developed to address these concerns by a group of experts in 1998 to facilitate surveillance of physical activity based on a global standard . The IPAQ has since become the most widely used physical activity questionnaire , with two versions available: the 31 item long form (IPAQ-LF) and the 9 item short form (IPAQ-SF). The short form records the activity of four intensity levels: 1) vigorous-intensity activity such as aerobics, 2) moderate-intensity activity such as leisure cycling, 3) walking, and 4) sitting. The original authors recommended the "last 7 day recall" version of the IPAQ-SF for physical activity surveillance studies , in part because the burden on participants to report their activity is small.
A common analysis method used to demonstrate questionnaire validity is to correlate self-reported activity data from the IPAQ-SF with data from an objective measurement device(s), both of which are obtained over exactly the same time period (concurrent validity). Another common method is to compute the absolute differences between the objective and self-reported measure. Both methods are essential in determining the validity of the IPAQ-SF, and a systematic review of the analyses that have been used to validate the IPAQ-SF would therefore be useful in assessing the merits of using the IPAQ-SF in epidemiological studies.
The first comprehensive validation of the IPAQ-SF was conducted across 12 countries, and reported correlations (all correlations reported were Spearman ρ's for the last 7 day's report) with the uniaxial CSA model-7164 accelerometer. A wide range of Spearman correlations, ρ = 0.02 (Sweden) - 0.47 (Finland), raised concerns of variability in validity in different populations. Variability in reported validity may be caused by several factors such as the demographic and cultural backgrounds of the participants, the way the information requested is processed and delivered, as well as variations in the "criterion gold-standard" used for objective comparison. Criterion measures used for IPAQ-SF validation have included the actometer , accelerometer  and pedometer , yet only one study has used the expensive doubly labeled water technique  as a criterion even though it has been recommended and is considered the most accurate objective measurement of physical activity [8, 22]. In addition to traditional measures of physical activity, various fitness measures (e.g. maximum oxygen uptake, VO2max ) have also been used as a reference standard to compare the IPAQ-SF because physical activity is strongly associated with cardiorespiratory fitness . Several of the objective measures yield different indices of activity, and the findings regarding validity may vary according to which index and objective measure is used as the standard, for example, both time spent in physical activity and raw count data have been used as a measure of physical activity from accelerometer . Variations also occur in how the objective measured data were transformed, for example the transformation algorithm from raw accelerometer data to time spent in moderate to vigorous physical activity [26, 27]. There have also been inconsistencies in the reporting of "total physical activity" from IPAQ-SF data, with studies using units involving metabolic equivalent task (MET), time spent in activity, or simply a trichotomized variable indicating the adequacy of physical activity . The IPAQ-SF instrument may also be better at capturing activity of some intensity level but not others, e.g., vigorous rather than moderate activity. Because the variability shown in the IPAQ-SF validity from these international studies has not been collated and systematically examined, we reviewed the effect of these sources on IPAQ-SF validity.
The IPAQ was first published with its validation based on a 12-country sample, and the authors recommended using the short form which measured physical activity by self-report over the previous 7 days . Since that time, more validation studies have been published for this short-form than for any other physical activity questionnaires . Despite the popularity of the IPAQ-SF and its widely accepted high reliability [13, 17], there has been no systematic review of its validity. Van Poppel et al.  have published a review of physical activity questionnaires used in adults, but included only four studies of the IPAQ-SF. Hence, a more comprehensive review of the IPAQ-SF is needed using data from the English language literature, with a focus on the variability of its relationship with the various validation measures as well as its absolute accuracy.
This paper has two objectives: (1) to review the analyses used in the IPAQ-SF validation studies, and (2) to consider possible explanations for differences between studies. For the first objective, we reviewed the studies validating the IPAQ-SF as a relative measure (i.e. studies that show a correlation with objective measures of physical activity) and/or an absolute measure (i.e. studies that compare levels of physical activity obtained by the IPAQ-SF against levels from an objective measure) of physical activity level. For the second objective, we examined whether the demographics of different samples, the indices derived from objective standards or the IPAQ-SF, or additional moderators which had contributied to the different levels of validity reported. Since the IPAQ-SF has been consistently shown to have a high reliability (ranging from 0.66 to 0.88) [17, 20, 25], we will not study this property here. We examined studies that sought to validate both (a) the overall physical activity score from the IPAQ-SF, as well as (b) those that focused on restricted information from the scale, e.g., different levels of intensity (vigorous activity, moderate activity and walking).