Measuring physical activity-related environmental factors: reliability and predictive validity of the European environmental questionnaire ALPHA

Background A questionnaire to assess physical activity related environmental factors in the European population (a 49-item and an 11-item version) was created as part of the framework of the EU-funded project "Instruments for Assessing Levels of PHysical Activity and fitness (ALPHA)". This paper reports on the development and assessment of the questionnaire's test-retest stability, predictive validity, and applicability to European adults. Methods The first pilot test was conducted in Belgium, France and the UK. In total 190 adults completed both forms of the ALPHA questionnaire twice with a one-week interval. Physical activity was concurrently measured (i) by administration of the long version of the International Physical Activity Questionnaire (IPAQ) by interview and (ii) by accelerometry (Actigraph™ device). After adaptations, the second field test took place in Belgium, the UK and Austria; 166 adults completed the adapted questionnaire at two time points, with minimum one-week interval. In both field studies intraclass correlation coefficients (ICC) and proportion of agreement were computed to assess the stability of the two test scores. Predictive validity was examined in the first field test by correlating the results of the questionnaires with physical activity data from accelerometry and long IPAQ-last 7 days. Results The reliability scores of the ALPHA questionnaire were moderate-to good in the first field testing (ICC range 0.66 - 0.86) and good in the second field testing (ICC range 0.71 - 0.87). The proportion of agreement for the ALPHA short increased significantly from the first (range 50 - 83%) to the second field testing (range 85 - 95%). Environmental scales from both versions of the ALPHA questionnaire were significantly associated with self-reported minutes of transport-related walking, and objectively measured low intensity physical activity levels, particularly in women. Both versions were easily administered with an average completion time of six minutes for the 49-item version and less than two minutes for the short version. Conclusion The ALPHA questionnaire is an instrument to measure environmental perceptions in relation to physical activity. It appears to have good reliability and predictive validity. The questionnaire is now available to other researchers to investigate its usefulness and applicability across Europe.


Background
Until recently, physical activity promotion research has focused on individual factors (demographics and psychosocial determinants). There is now growing agreement among researchers that the physical or built environment may play an important role as well [1,2]. Research into the link between the built environment and physical activity is still in its infancy, but is expanding rapidly as demonstrated by the Active Living Research Reference list that comprised 465 references, published in 2008 in various journals [3,4].
However, until now, evidence of the predictive relationship between environmental determinants and physical activity is not very consistent. Wendel-Vos and colleagues found in their review of 47 papers [5] only a few consistent correlates among adults, e.g. between availability of physical activity equipment and vigorous physical activity and between trailconnectivity and active commuting. Also in youth, only some specific consistent associations were found between environmental factors and physical activity [6].
One of the challenges of this new research domain is how to measure attributes of the built environment associated with physical activity in a valid, reliable and feasible way. Studies of the physical environment and physical activity have typically used two types of exposure measures: (i) measures of perceptions of the environment using questionnaires; (ii) objective measures of the environment derived from observations of the environment (audits, ground truthing) or through spatial Geographic Information Systems (GIS) data [7].
Early measures of perceptions of the environment were criticised for their lack of metric data (e.g. repeatability, face validity) [8]. The development of perceived environmental measures has emerged outside Europe: in Australia the Social Environmental Individual Determinants (SEID) study conducted by Giles-Corti and colleagues [9] and from three US research centres in North Carolina [10], South Carolina [11], and California [12]. Characteristics of the built environment in Europe differ considerably from those in the US or Australia, especially in terms of housing density and land use mix. This raises questions about the applicability of these questionnaires in a European context. As a consequence a small number of European studies have developed their own or have adapted international questionnaires to the European context. However, a consensus about which environmental questionnaire should be used in Europe has yet to be reached.
One objective of the EU-funded Instruments for Assessing Levels of Physical Activity and Fitness (ALPHA) project, is to propose standardised instruments for physical activity and fitness monitoring across Europe [13]. On the basis of a literature review on currently used environmental questionnaires in Europe and a consensus meeting with an international expert group, a European environmental questionnaire was conceived [14]. Two versions were developed: a form containing 49 items suitable for use in research studies and a shorter 11-item form more suitable for surveillance and monitoring purposes. The development of the questionnaire is described in more detail elsewhere [14]. The next step in the project was to test the reliability and validity of the questionnaire in different languages and in different European countries. The paper reports on assessment of the test-retest stability, predictive validity and feasibility of the ALPHA environmental questionnaire in three European countries.

Methods
The reliability and validity testing were undertaken in four phases, translation, cognitive testing, and two iterations of field testing. First of all the original version of the ALPHA questionnaire was translated into Dutch, French, and German, followed by cognitive testing. Next a first field test was conducted in three countries. An expert meeting was organised to discuss the results before a second smaller field test was conducted to assess the modified questionnaire.

Translation and cognitive testing
The English questionnaire (the source) was translated into Dutch, French and German using a standard protocol based on the guidelines of Eurostat [15]. To guide the translation process, conceptual cards were included after each question in the English version. These conceptual cards contained brief notes to explain the format of the questions and the underlying concept to be measured. Two translators, both of whom were native speakers and familiar with the topic, worked independently. They read and translated these conceptual cards into the target language before translation of the questions. After translation the two translators, together with a reviewer, discussed any particular translation problems until a final consensus was reached.
After the translation process, cognitive testing was conducted using cognitive interviewing [16] with at least five persons for each language. Respondents were asked to think aloud while processing each question and deciding how to answer to the question. If something was not clear the interviewer would ask questions to start a discussion.
Through the cognitive testing process, questions that were not clear or comprehensive were identified, discussed with the research team and rephrased.

Field testing I Participants and procedures
Participants were recruited in three countries (Belgium, UK and France) between October 2008 and January 2009. To ensure some variance in the measured characteristics (e.g. population density), the participants within each country were derived from distinct areas (and thus different built environments). In Belgium a random sample in three different neighbourhoods (town, outskirts of town and village/countryside) was drawn. In each neighbourhood, letters with information about the study were distributed by post. One week after mailing the information letter, potential participants were visited at home and asked if they would participate. In the UK, participants randomly selected from 10 areas of an English city for a previous study [17], were contacted by telephone and appointments arranged to visit willing individuals. In France a convenience sample of adults living in the city centre and suburbs of Paris was recruited. Inclusion criteria were: aged 20-65 years, literate in the language of the questionnaire (Dutch, English or French respectively), lived at their current address for at least two months, and without physical disability that would prevent or hamper walking or cycling. The final sample consisted of 190 participants, 60 from Belgium, 64 from UK and 66 from France.
To assess test-retest stability, participants completed, in the presence of a researcher, both forms of the ALPHA questionnaire twice, with an interval of one to two weeks. This is a standard time frame in test-retest studies as it is long enough so that respondents are unlikely to remember their answers to the first testing, but short enough to minimise potential changes in physical activity behaviour. To avoid order effects, participants in each study centre were randomly assigned into two groups: Group 1 completed the short version of the questionnaire first (at first and second assessment), followed by the 49-item version, and Group 2 completed the 49-item version first (at first and second assessment), followed by the short version.
To assess predictive validity, physical activity behaviour was measured by accelerometry and long International Physical Activity Questionnaire (IPAQ) last 7 days. Participants were asked to wear accelerometers on the hip during all waking hours for 7 consecutive days following the first visit. Accelerometer recordings were collected at the second visit at which time the researcher interviewadministered the Long IPAQ last 7 day. The interview version was preferred to the self-administered version of the IPAQ because of the tendency towards over reporting of physical activity that has been previously reported [18]. The length of time needed to complete each questionnaire at the first visit was recorded. No incentive was provided for participation.

Measures
The development of the initial ALPHA environmental questionnaire has been described elsewhere [14]. The instrument included questions on: types of residences in your neighbourhood (3 items), distance to local facilities (8 items), walking or cycle infrastructure in your neighbourhood (4 items), maintenance of infrastructure in your neighbourhood (3 items), neighbourhood safety (6 items), how pleasant is your neighbourhood (4 items), cycling and walking network (4 items), home environment (6 items), workplace or study environment (11 items). For the short form of the questionnaire the number of items was reduced to eleven, with a minimum of one item included from each theme. In both versions neighbourhood was defined as "...the area ALL around your home that you could walk to in 10-15 minutesapprox 1.5 km" (or "1 mile" for UK-context). Self-reported physical activity level was assessed by the Long IPAQ last 7 day http://www.ipaq.ki.se/ipaq.htm. This instrument asks about physical activity behaviour over the last 7 days, according to categories of physical activity intensity, in different contexts such as physical activity as transport, physical activity at work or study, physical activity at home and physical activity in leisure time; it has been shown to be reliable and valid [19].
The MTI Actigraph accelerometer model 7164 was used in Belgium and France, and the Actigraph GT1M was used in the UK. In all cases an epoch time of one minute was used to provide an objective measure of habitual physical activity (over 7 days).
Finally, participants were asked to provide information on their age, height, weight, sex, ethnicity, living situation, educational attainment, occupational status and living environment.

Data reduction
Adverse items of the environmental questionnaire were recoded and sum scores for each scale were calculated.
For the long IPAQ last 7 day, each activity was expressed in minutes/week by multiplying frequency (day/week) and duration (minutes/day) of the activity. Indices of each domain were calculated by summing all physical activities undertaken for each specific context (work, domestic, transport and leisure). A 'total moderate-intensity and vigorous-intensity physical activity' index was computed by summing all reported physical activities undertaken at moderate and vigorous intensity across the four domains.
Accelerometer data were downloaded by placing the accelerometer into a reader interface unit (RIU) and using specific software (RIU256.exe) [20]. Further the data were analysed by a custom-written program (MAHUFFE.exe, available from http://www.mrc-epid.cam.ac.uk). Accelerometer data were included in the analysis if the minimal number of wearing days was 4 (with at least one weekend day), with a minimum of 10 hours recording time for week days and 8 hours for weekend days, and excluding the relevant hours if there was an interruption in wearing time during the day of more than 60 minutes. To calculate physical activity at low intensity (LPA), at moderate (MPA) and at vigorous physical activity (VPA) Freedson's cut-offs [21] were used (<1952 counts per minute for LPA, between 1952 -5724 counts per minute for MPA and >5724 counts per minute for VPA).

Statistical analysis
Cronbach alphas were calculated to assess the internal consistency of each scale of the environmental questionnaire; results >0.70 were considered good [22]. Intraclass coefficients (sum scores or items on 5 point scales) were used to compute the coefficient of stability of the scores on the two tests. ICC estimates >0.75 were considered as good reliability scores, between 0.50-0.75 as moderate reliability and <0.50 as poor reliability [23]. Proportion of agreement was also calculated to measure the proportion of occasions that individuals gave the same score. Proportion of agreement above 0.70 was considered high [24].
Pearson correlations between environmental variables (sum scores) and accelerometer data, and between environmental variables and IPAQ measurements, were calculated to assess predictive validity.
All analyses were performed using SPSS 15.0 software (SPSS Inc., Chicago, IL, USA).

International expert meeting
After the first field testing an international expert meeting in February 2009 was organised to discuss the results (a list of all experts can be found in additional file 1). Items with lower scores on reliability or validity were discussed and rephrased until consensus was reached.

Field testing II Participants and procedures
For the second and smaller field testing a new sample was recruited in three countries (Belgium, UK and Austria) between April and May 2009 using the same inclusion criteria as in the first field testing. In Belgium a random sample in three different neighbourhoods (town, outskirts of town, and village/countryside -all different from those in the first field testing) was recruited using the same approach as used in the first field testing. In the UK and Austria, convenience samples comprised university colleagues, students and other associates participated. The final sample consisted of 166 participants, 60 from Belgium, 57 from the UK and 49 from Austria.
In this second round of testing only test-retest stability was assessed for both versions, in a similar way to the first field testing.

Measures
An adapted version of the ALPHA environmental questionnaire was used. This instrument can be found in additional file 2 (49-item version) and additional file 3 (short version) and on the International Physical activity and Environmental Network (IPEN) website http:// www.ipenproject.org. The same themes as in the original version [14] were used, but some items were changed. For example the answer categories of the short version changed from a four point scale (strongly disagree to strongly agree) to a two point scale (yes-no). The neighbourhood definition was also rephrased, reducing the area around the home to "approximately one kilometer or half a mile" instead of 1.5 kilometer and 1 mile. All changes are detailed in additional file 4. No other measures were included in the second field testing.

Data reduction and statistical analysis
Adverse items of the environmental questionnaire were recoded and sum scores for each scale were made. Cronbach alphas were calculated to assess the internal consis-tency of each scale of the environmental questionnaire. Intraclass correlation coefficients (sum scores or items on 5 point scales) and proportion of agreement (separate items) were used to compute the coefficient of stability of the scores on the two tests.

Field testing I
Most participants in the first field testing were female (63%); most participants lived in an urban area (86.3%) and were employed (78.9%). Average age was 40 years and average BMI 25 kg/m 2 (see Table 1). Cronbach alphas ranged from 0.57-0.76 (data not shown) except for the walking and cycling infrastructure scale (alpha = 0.37).

Feasibility
Mean (±SD) time for questionnaire completion during the first assessment was 6 minutes 47 seconds (±2 min) for the 49-item version and 1 minute and 46 seconds (±39 seconds) for the short version. Table 2 shows answer frequencies and mean score of each item on the first assessment of the ALPHA environmental questionnaire and its test-retest reliability scores. The ICCs of the sum scores of each of the nine subscales ranged from 0.66 to 0.86. Six of the nine sum scores were above 0.75 which indicates good reliability; three of them (residential density, infrastructure and maintenance) were between 0.60-0.75, which shows moderate reliability. ICC of the individual items ranged from 0.44-0.82 with the lowest scores for particular safety items and items of the cycling and walking network scale. Proportion of agreement for all individual items ranged from 52-99%. Table 3 summarises the answer frequencies and mean scores for each item on the first assessment of the ALPHA short, together with test-retest reliability scores (ICC and proportion of agreement). The ICC of the total sum score was 0.75 which indicates good test-retest stability. The ICC for individual items ranged from 0.50-0.80 and thus showed only moderate reliability. Proportion of agreement was also low ranging from 50-83%, with only two items equal or above 70%. Tables 4 and 5 show the significant correlations of the subscale of the ALPHA questionnaires (both forms) with the physical activity measurements (both IPAQ and accelerometers).

Predictive validity
All significant correlations were in the hypothesised directions (higher environmental support of physically activity was correlated with higher levels of physically activity) except for the negative correlations found between the scales 'availability of sidewalks' and 'safety from traffic' with some IPAQ variables. The size of all correlations ranged from 0.19-0.38 which is an indication of moderate validity. Environmental scales of ALPHA were mostly significantly correlated with minutes of transport-related walking as measured with the IPAQ, both in men and women. Very few significant correlations were found with accelerometers in men, however there were several significant correlations found in women, especially with physical activity at low intensity.
The sum score calculated from the ALPHA short was significantly correlated with both IPAQ and accelerometers in men and women. All significant correlations were in the expected directions and ranged from 0.21-0.34.

International expert meeting
Based on the results of the first field testing wording and answer categories of specific items with lower reliability       scores were modified following discussions at the expert meeting (see Additional file 4).

Field testing II
In the second field testing almost half (47%) of the participants were female. Most of the participants lived in an urban area (77.1%), and were employed (59%), with an average age of 33 years and an average BMI of 24 kg/m2 (see Table 1).

Internal consistency
Cronbach alphas ranged from 0.65-0.82 except for the pleasant environment scale (alpha = 0.34) (data not shown).

Test-retest reliability
Answer frequencies and mean scores for each item on the first assessment of the ALPHA questionnaire and their test-retest reliability scores are shown in Table 6. ICCs of the sum scores of each subscale ranged from 0.71 to 0.87, with six of the nine ICCs above 0.75, showing good testretest reliability. ICCs of the individual items ranged from 0.54-0.87, showing moderate to good stability. Proportions of agreement for all individual items ranged from 59-99%. In Table 7 the answer frequencies of each item on the first assessment of the ALPHA short were given, together with their test-retest reliability scores (ICC for the sum scores and proportions of agreement for the individual items). The ICC of the total sum score of the ALPHA short was 0.73 which indicates good test-retest stability. For the individual items the proportions of agreement were good, ranging from 85 to 95%.

Discussion
The purpose of this study was to assess test-retest reliability, predictive validity and feasibility of the ALPHA environmental questionnaire in samples of men and women from several European countries.

Reliability
All but two of the subscales (distance to local facilities scale and the safety scale) in the ALPHA questionnaire showed low levels of internal consistency. This is appropriate for environmental variables as the aim of an environmental questionnaire is to sample possible indicators of one environmental construct which are often not intercorrelated, so Cronbach alphas are often low. In the literature similar internal consistency values for environmental scales are found e.g. the Cronbach alphas of the Cycling for Transport questionnaire range from 0.46 to 0.70 [25].
In the first testing, moderate to good test-retest reliability was evident for the ALPHA questionnaire (ICCs of the subscales ranged from 0.66-0.86); while in the second field testing all subscales showed good reliability (ICCs ranged from 0.71-0.87). The ICCs were not significantly different between two test-phases (t = -1.207, p = 0.247), but there was a significant increase (t = -2.779, p = 0.008) in percentages of agreement from the first (range 52-99%) to the second field testing (range 59-99%). Similar testretest values have been reported for other environmental questionnaires, e.g. ICCs for the test-retest reliability of the NEWS subscales ranged from 0.58 to 0.80 in one study [12] and from 0.41-0.93 in another study [26]; for the IPAQ environmental module ICCs ranged from 0.36 to 0.98 [27]; and Evenson et al. reported ICC values from 0.64 to 0.91 for environmental items in their physical activity questionnaire [28].
Overall, our findings suggest that the final version of the ALPHA questionnaire has good reliability, comparable to that found in equivalent instruments.
The reliability results of the ALPHA short were more difficult to compare between both field tests, given the changes to answer categories (i.e. ICCs for the individual items could not be analysed). However, for the total score, reliability was good and of similar magnitude in both testing phases (0.75 and 0.73 respectively). For the individual items the proportion of agreement found in the first field testing (50% to 83%) increased significantly (t = -9.175, p < 0.001) in the second field testing (85% to 95%); showing greater item stability in the later version. Reliability values of the ALPHA short compared favourably with other instruments, e.g. proportion of agreement for environmental items of South Carolina (47% to 94%), for the NEWS (33% to 98%) and for the St Louis Instrument (40% to 96%) [26].

Predictive validity
In general, moderate predictive validity was found for both versions of the ALPHA environmental questionnaire. As expected, most associations were found between the environmental scales and "walking for transport" measured with the IPAQ. This is consistent with the transport literature in which urban planners show that certain environmental factors, like those included in the ALPHA questionnaire, are associated with increased levels of walking [29,30]. Also Saelens et al. [12] and De Bourdeaudhuij et al. [31] have found good associations between the attributes of built environment and walking for transport. It should be mentioned, however, that because of the cross-sectional nature of the current and previous studies [8,32], no causal conclusions could be drawn. Therefore, the explanation that the built environment has a positive influence on physical activity levels could also be reversed i.e. people with higher physical activity levels perceive more physical activity opportunities in their built environment than people who are less physically active.
With the accelerometers, context-related physical activity could not be assessed, but almost all associations between the perceived environment and low physical activity were in line with the results of the IPAQ. Contrasting results between IPAQ and accelerometers were found for "safety from traffic" and "aesthetics". Somewhat unexpectedly, these environmental factors were related with lower levels of walking for transport measured with the IPAQ in women. However, we did find associations in the hypothesised direction with the Actigraph data, namely "safety from traffic" and "aesthetics" were related with respectively more minutes of vigorous physical activity and more minutes of physical activity at low intensity. In the literature, aesthetics and physical activity behaviour are consistently positively related, but the association between safety and physical activity behaviour is less consistent [8].

Feasibility
Both versions of the questionnaire appeared feasible in terms of completion time. The 49-item version was completed in a relatively short period of time compared to other environmental questionnaires requiring only an average of 6 minutes to complete. Given the low participant burden we recommend using this version as it gives a better overall picture of the built environment than ALPHA short.

Strengths and Limitations
The ALPHA questionnaire has undergone extensive conceptual and field testing and refinement. It is now ready for further assessment within different populations and environments across Europe. One of the limitations in this study was the high education level of our participants in the second series of field testing compared with our first field test participants, which might explain some of the improvements seen in the test-retest scores.
A second limitation of this study are the different sampling methods (probability and non-probability) used in the three countries and the possible clustering within each country, which may have resulted in more positive results.
Another possible limitation is that objective environmental measurements were not included in the testing and thus the perceptions could not be compared with objective data. However, it has to take into account that objective and subjective measures of the build environment are two different concepts. Previous studies [33][34][35] found only a low to moderate agreement between objective and subjective measures. In some studies perceptions of the environment had a greater impact on PA behavior (or vice versa) compared to objective measured environment [33,36] while another study found a greater influence of objective measures [37]. More research is needed to explore further the relationships and differences between perceived and objectively measured attributes of the environment.
Our questionnaire was based on extensive synthesis and adaptation of previous similar instruments [14] however this may repeat any systematic errors contained within these instruments [14,38]. We feel there remains a challenge in built environment and physical activity research of evaluating the congruence between definitions used in environmental questionnaires and adults' own definitions of neighbourhood.

Conclusion
The ALPHA questionnaire is a good instrument for measuring environmental perceptions related to physical activity behaviour, with moderate to good reliability, pre-dictive validity and feasibility. The instrument was developed in collaboration with an international expert group and was subject to different test phases. However, we acknowledge the considerable challenges of this field, and in light of the limitations outlined believe that further testing is required to improve generalisability to other European countries. Future testing will look to correlate the perceived environmental outcomes with other physical activity-related measures such as fitness, heart rate and geographic information system (GIS) measured objective environmental measures. By the means of this paper we would also like to make the ALPHA questionnaire available to other researchers who could further investigate whether our questionnaire represents an appropriate instrument for assessing perceptions of the environment related to physical activity across Europe. The questionnaire (in different languages) and a manual can be found on the IPEN website http://www.ipenproject.org.