A systematic review of the effect of infrastructural interventions to promote cycling: strengthening causal inference from observational data

Background Previous reviews have suggested that infrastructural interventions can be effective in promoting cycling. Given inherent methodological complexities in the evaluation of such changes, it is important to understand whether study results obtained depend on the study design and methods used, and to describe the implications of the methods used for causality. The aims of this systematic review were to summarize the effects obtained in studies that used a wide range of study designs to assess the effects of infrastructural interventions on cycling and physical activity, and whether the effects varied by study design, data collection methods, or statistical approaches. Methods Six databases were searched for studies that evaluated infrastructural interventions to promote cycling in adult populations, such as the opening of cycling lanes, or the expansion of a city-wide cycling network. Controlled and uncontrolled studies that presented data before and after the intervention were included. No language or date restrictions were applied. Data was extracted for any outcome presented (e.g. bikes counted on the new infrastructure, making a bike trip, cycling frequency, cycling duration), and for any purpose of cycling (e.g. total cycling, recreational cycling, cycling for commuting). Data for physical activity outcomes and equity effects was extracted, and quality assessment was conducted following previous methodologies and the UK Medical Research Council guidance on natural experiments. The PROGRESS-Plus framework was used to describe the impact on subgroups of the population. Studies were categorized by outcome, i.e. changes in cycling behavior, or usage of the cycling infrastructure. The relative change was calculated to derive a common outcome across various metrics and cycling purposes. The median relative change was presented to evaluate whether effects differed by methodological aspects. Results The review included 31 studies and all were conducted within urban areas in high-income countries. Most of the evaluations found changes in favor of the intervention, showing that the number of cyclists using the facilities increased (median relative change compared to baseline: 62%; range: 4 to 438%), and to a lesser extent that cycling behavior increased (median relative change compared to baseline: 22%; range: − 21 to 262%). Studies that tested for statistical significance and studies that used subjective measurement methods (such as surveys and direct observations of cyclists) found larger changes than those that did not perform statistical tests, and those that used objective measurement methods (such as GPS and accelerometers, and automatic counting stations). Seven studies provided information on changes of physical activity behaviors, and findings were mixed. Three studies tested for equity effects following the opening of cycling infrastructure. Conclusions Study findings of natural experiments evaluating infrastructural interventions to promote cycling depended on the methods used and the approach to analysis. Studies measuring cycling behavior were more likely to assess actual behavioral change that is most relevant for population health, as compared to studies that measured the use of cycling infrastructure. Triangulation of methods is warranted to overcome potential issues that one may encounter when evaluating environmental changes within the built environment. Trial registration The protocol of this study was registered at PROSPERO (CRD42018091079).


Background
Promoting physical activity is one of the key strategies to combat the burden of many chronic diseases [1]. Cycling can contribute to meeting the recommended daily physical activity levels [2,3]. A metaanalysis including 187,000 individuals and 2.1 million person-years showed that 2.5 h per week of cycling at moderate intensity was associated with a 10% lower mortality risk, independent of overall levels of physical activity [4]. In addition to this, a Danish study found that those who cycled and, those who started cycling after the age of 50 years had a lower risk of coronary heart disease and developing diabetes than those who did not cycle [5,6]. Modelling studies have also showed that the population health benefits of cycling outweigh the negative risks, such as exposure to air pollution and traffic accidents [7,8]. This indicates that promoting cycling can result in populationlevel health benefits.
Providing an infrastructure that supports the needs of cyclists has been considered as an important strategy to encourage more cycling in cities [9][10][11]. However, designing studies to evaluate such infrastructural interventions is challenging. Although randomized controlled trials (RCTs) are regarded as the gold-standard for estimating causal effects of health interventions, to our knowledge no studies exist that used the RCT design to assess the impact of infrastructural interventions on cycling. This is not surprising, as changes in the built environment are often beyond control of the researcher and therefore difficult to randomize. Other analytical techniques are required to evaluate these so-called "natural experiments", in which variation in accessibility to new cycling infrastructure is used to assign intervention and control groups [12][13][14].
Two recent systematic reviews have been completed which examine the impact of infrastructure on levels of cycling [15,16]. Both reported that cycling increased following the introduction of new infrastructure, or upgrading of existing infrastructure. However, both reviews also noted that the methods in the included studies may have affected the study findings. Stappers and colleagues [15] noted variable quality in study designs across studies examining impacts on physical activity, active transport and sedentary behavior. They suggest that more refined designs may decrease the possibility of detecting intervention effects. Panter and colleagues [16] focused only on studies assessing walking and cycling, and examined the evidence for the effectiveness and mechanism of interventions. They found that higher quality studies were more likely to report intervention effects for cycling. Taken together, differences in methods may have impacted the overall conclusion (no changes vs positive changes), or the magnitude of the finding (small changes vs large changes). Ignoring methodological differences may wrongly lead to the conclusion that some interventions were more effective than others.
The current review builds on the main finding of previous reviews that interventions in the built environment may affect cycling [15,16]. We focused on the methodological approaches undertaken to evaluate the effects of infrastructural interventions. Both reviews did not quantitatively summarize the findings, thereby leaving the question unanswered if the magnitude of the findings changed when using different methodology. One review was unable to capture relevant literature published outside of health-related journals [15]. The research questions are likely to be different between health researchers and transportation researchers, potentially leading to differences in study designs and findings.
Focusing on whether different methodological approaches produce different results, and assessing the strengths and limitations of different methods for causality, will provide greater understanding about the implications of findings from research and their utility for policy makers and practitioners. Therefore, the aims of this systematic review were to summarize the effects of infrastructural interventions on cycling and physical activity in the population, and to evaluate whether the effects varied by study design, data collection methods, or statistical approaches.

Methods
The protocol of this study was registered in March 2018 at PROSPERO (CRD42018091079). Our systematic literature search followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines [17].

Search strategy
Various electronic databases (Embase.com, Medline Ovid, Web of Science, PsycINFO Ovid, CINAHL EBS-COhost, Google scholar) were searched for literature published until February 2018 for any studies assessing infrastructural projects to promote cycling. We updated the initial search until June 2019 to additionally include most recent publications. Search terms for the different databases can be found in Additional file 1. Search terms were constructed of 3 parts, including synonyms for cycling infrastructure to identify exposures, synonyms for cycling behavior, active transport, physical activity and lifestyle changes to identify outcomes, and a term that excluded conference abstracts, letter to the editors, notes and editorials. No restrictions were made on language. Database searches were supplemented with searches of reference lists of included studies and key review papers.

Study selection and inclusion criteria
All titles and abstracts identified during the initial search were screened for inclusion by two independent researchers (FJMM, NB). Additional articles identified through the updated search were screened by a single author (FJMM). After screening titles and abstracts, fulltext articles were screened according to predefined criteria. Articles obtained in full-text were reassessed for inclusion by the first two authors (FJMM, JP), and discrepancies were resolved after discussion with a third researcher (FJvL). Eligibility criteria included: 1) a study evaluating an infrastructural intervention to promote cycling, 2) any measure of cycling as outcome, 3) cycling measured before and after the intervention, and 4) reporting on a general adult population aged 16 years and above. Examples of interventions include the opening of cycling lanes, the installation of a city-wide cycling network, and the improvement of existing cycling infrastructure. We included papers that evaluated the same intervention, but reported on different outcomes or used different datasets or methods to collect outcome data. Controlled and uncontrolled studies were included to allow for a large variety of study designs. Studies were classified as controlled studies if data was collected in a different population that was selected based on comparable individual or neighborhood characteristics, and if similar data collection methods were used. We also classified studies as controlled studies if a comparison was made within the study population between people who lived closer to an intervention and those who lived further away. Studies that presented city-or area-wide cycling trends as a comparison were considered uncontrolled, as the data collection methods used in routine monitoring surveys often differed from that used in the intervention group, and population characteristics often differed between areas.
Studies that evaluated the introduction of cycling infrastructure together with other environmental components were included (i.e. bike parking, showers, rental bikes), as long as the main goal of the intervention was to promote cycling. Environmental interventions that did not change the cycling infrastructure were excluded. We specifically aimed to study population-based approaches to change health behaviors, and therefore excluded infrastructural interventions that were part of a combined intervention with behavioral components targeting the behavior of individuals (i.e. cycling courses, safety lessons, or other approaches that target individual behaviors). Studies that included media campaigns along the intervention were included, as long as they aimed to target the population as a whole.
We excluded opinion articles, qualitative evaluations without quantitative assessment, studies retrospectively collecting data on cycling, and studies not directly linked to an infrastructural intervention. We also excluded studies in which the presented outcome measure was not specified for cycling, like active travel which combined walking and cycling together, or modal shifts where the shift in mode was not specified.

Data extraction
From the included studies, one researcher extracted data (FJMM) using a standardized data extraction form, and a second reviewer (JP) verified a 20% sample of the extracted data. The extracted data included publication details, description of the intervention, study design, data collection methods, analytical methodology, and study results.
Ideally, we would have extracted a single outcome related to cycling per study. However, most studies did not specify a primary outcome of cycling. Therefore, we extracted all cycling outcomes presented from the maximally adjusted model with the longest exposure time. We extracted all outcomes for various purposes of cycling (e.g. total cycling, recreational cycling, cycling for commuting), and all outcomes for various metrics of cycling (e.g. bike count data, cycling frequency, cycling duration). If the outcome was assessed in multiple populations or at multiple locations, we extracted the average change in cycling that was presented by the authors. If no summary measure was presented, we calculated an unweighted average effect. Some studies stratified the population by exposure status, and evaluated a possible exposure-outcome relationship by distance from home to the intervention or usage of the intervention. All available information was extracted for these studies and included in the descriptive part of the review. However, including all strata-specific outcomes in the quantitative analyses would mean that studies with multiple strata would have a much greater contribution to the findings than studies without stratification. Therefore, we only used the results from the group most likely to use the intervention in the quantitative summary (e.g. smallest distance or largest potential usage). We noted that various metrics were used for expressing data relevant to cycling. We distinguished outcomes that evaluated cycling behavior (e.g. making a bike trip, cycling frequency, cycling duration) from those that evaluated usage of cycling infrastructure (e.g. bikes counted in the city, bikes counted on the new infrastructure). We extracted data on both absolute change (no fixed unit, can refer to various metrics) and relative change (expressed as percentage change over time) in cycling between before and after measurements, and attempted to calculate outcomes for both where possible. We used a similar framework presented by Goodman [18] to compute measures of absolute and relative change. Outcomes expressed as ratios were interpreted as relative changes. For uncontrolled studies, the relative change was computed by dividing the absolute change by the baseline level of cycling in the study sample. For controlled studies, we first computed the relative change in the intervention and control group separately. Subsequently, the calculated relative change in the intervention group was divided by the calculated relative change in the control group. Likewise, to obtain an absolute change when only relative changes were presented, we multiplied the relative change by the baseline estimate in the study sample as a whole for uncontrolled studies, and by the baseline estimate in the control group for controlled studies. Examples of the data extracted and how outcomes were calculated are presented in Additional file 2. Authors were contacted if only the direction of the association was presented. For each study we extracted data on statistical tests performed, and if significant results were found (P < 0.05). However, we focused on directions of the association rather than significance, since a substantive part of the studies did not test for significant changes in cycling outcomes that were of interest for this review.
We extracted data on the methodological quality, and on all design elements and additional analyses that may have supported causal inference following previous methodologies. The quality items described by Ogilvie et al. [19] were extracted, which used the criteria from the Community Guide of the US Task Force on Community Preventive Services to assess study design [20], and criteria developed for the Effective Public Health Practice Project in Hamilton, Ontario to score five items related to the quality of the research performed [21]. The five items included representativeness, comparability, credibility of data collection instruments, retention, and attributability of the effect to the intervention. The original instrument also assessed randomization, but this was not assessed as the allocation to the intervention and comparison group was not under control of the researcher. In addition, we extracted the results from additional analyses that may support causal inference identified by the UK Medical Research Council guidance on natural experiments [12], including multiple comparison groups, the inclusion of a neutral outcome that is not expected to change as a consequence of the new cycling infrastructure, and the use of complementing research methodologies.
The PROGRESS-Plus framework was used to describe the impact of the infrastructural interventions on subgroups of the population [22]. The PROGRESS-Plus framework considers nine factors for which differences in effect may occur: 1) place of residence, 2) race, ethnicity, culture, language, 3) occupation, 4) gender, sex, 5) religion, 6) education, 7) socioeconomic status, 8) social capital, and 9) the 'Plus'-factor that could be other characteristics associated with social disadvantage. In our study we considered age, health status or BMI, bike ownership, and car ownership as Plus-factors, since these factors may have been relevant determinants of disadvantage given the context of the intervention.

Data synthesis
We provided a descriptive narrative synthesis of studies. There was no possibility to quantitatively summarize the results, because of the large variety of outcome metrics and purposes of cycling presented, the lack of a primary outcome, and the lack of a common outcome across studies. Therefore, we presented the median relative change for the umbrella-termscycling behavior and infrastructure usage for all studies, and by study design (controlled vs uncontrolled; exposure time ≥ 1 year vs < 1 year), data collection methods (objective vs subjective), and analytical approaches (tested vs not tested). We did not present units for the median relative change because it can refer to various metrics. For example, an increase in cycling behavior of 30% could refer to an increase in the proportion of cyclists, cycling frequency, or cycling duration. An overview of studies with baseline characteristics or performed adjusted analyses by any of the PROGRESS-Plus factors was presented. We provided a descriptive narrative synthesis for the studies that formally tested for differential effects on PROGRESS-Plus factors.

Study characteristics
From the 3542 potential records, 125 full-text articles were screened and this resulted in 31 studies (29 interventions) from 11 countries that met the eligibility criteria (Fig. 1). The major reason for exclusion of full-text articles is presented in Additional file 3. Table 1 presents the characteristics of included studies categorized by the outcome of interest. Twenty studies presented data on cycling behavior [23][24][25][26][27][28][29][30][31][32][33][34][35][36][37][38][39][40][41][42], and 16 studies assessed usage of the cycling infrastructure [23,29,31,38,[42][43][44][45][46][47][48][49][50][51][52][53]. All infrastructural interventions were conducted in urban areas in high-income countries. The interventions were very diverse in terms of design and scale, ranging from the introduction of a cycling bridge, single or multiple cycle paths or lanes, or a city-wide cycling network. Six studies (5 interventions) described issues related to data collection due to delays in the construction work, resulting in shorter follow-up periods than planned [23,[31][32][33][34]39]. In addition to this, three studies (2 interventions) mentioned that the intervention was not fully completed within the study time frame [31,33,34]. Most studies used a similar analytical approach by comparing a single estimate before the intervention with a single estimate after the intervention, with or without comparing it to changes in a control group. One study used a fixed-effects approach to evaluate the withinperson change over time [27], and three studies tested if there was a significant interaction between the intervention and time [29,31,35]. One study conducted an interrupted time series analyses, whereby the date of the opening of the cycling track was used to set the time of interruption [47].              exposure time, method of assessment, and whether significance was tested. In general, studies reporting behavioral outcomes found smaller changes than studies presenting usage of the infrastructure. Larger changes were also found for studies that tested for statistical significance and studies that used subjective measurement methods (such as surveys and direct observations of cyclists), compared to studies that did not perform statistical tests, and used objective measurement methods (such as GPS and accelerometers, and automatic counting stations). Additional file 4: Table S1 provides further details of the number of studies which assessed cycling behavior or usage of the infrastructure for cycling, and whether these were in favor of the intervention or not. Twenty studies presented data on 52 cycling behavior outcomes. All but two [23,32], found an increase in cycling for at least 1 outcome, and 73% (38/52) of all outcomes presented were in favor of the intervention. A total of 36 cycling behavior outcomes were used to quantitatively summarize the results. Together, studies found a median relative increase in cycling behavior (median relative change: 23%; range: − 21 to 262%). Changes in cycling did not essentially differ between controlled and uncontrolled studies. Studies with an exposure time shorter than 1 year found smaller changes when compared to those using a longer exposure time. Studies that used objective measures to assess cycling behavior found smaller changes than those that used self-reported measures, and studies that did not test for statistical significance found smaller changes than those that did.

Study results
Seven studies evaluated changes in physical activity patterns following cycling infrastructure interventions. Brown et al. showed that among cyclists, cycling time on intervention streets increased by 7 min/week and on other streets increased by 6 min/week. Daily energy expenditure increased in the study population by 0.19 kcal/ min, which translates into 275 kcal/day [26]. Goodman et al. found that living 1 km closer to the intervention increased cycling for recreation by 3 min/week, and total physical activity by 13 min/week [33]. There was no evidence that compensation of physical activity behaviors took place, since physical activity excluding walking and cycling was not associated with the intervention. Burbidge et al. did not find changes in total physical activity time, but the number physical activity episodes seemed to have declined by 0.2 trips/day following the introduction of cycling infrastructure [27]. The other four studies did not find evidence that the introduction of cycling infrastructure affected physical activity [29,31,32,39].
Usage of the infrastructure was presented in 16 studies with 21 outcomes, and all were in favor of the intervention (median relative change: 62%; range: 4 to 438%) ( Table 2). Changes for infrastructure usage were smaller for studies that were uncontrolled, studies with longer exposure time, studies using automatic counters or GPS tracking information, and studies that did not test for statistical significance (Additional file 4: Table S1).       None of the studies was a randomized experiment, therefore randomization was not applicable for any of the studies and was not shown. b None of the studies presented data for neutral outcomes that were hypothesized to be unaffected by the new infrastructure designed to promote cycling, therefore this parameter was not shown. c A = controlled before-after study; B = uncontrolled study with at least two before and two after data points; C = uncontrolled study with only 1 before and after data point Quality assessment Table 2 presents information on the quality of the studies. Nine out of twenty studies evaluating the impact of cycling infrastructure on cycling behavior presented data on participation, and nine on representativeness. Participation ranged between 2 and 49% for those that presented information. Thirteen studies collected data twice on the same individual, and retention ranged between 41 and 79%. Most studies used surveys to collect data, but the exact methodology and validity of the question items was often not reported.
When considering the quality of the studies for causal inference, studies reported that other changes in the physical and social environment might have affected or biased their results. Issues reported were the economic crisis, the rising cost of car transport, social marketing campaigns, and other infrastructural improvements during the same period. Authors were often unable to account for these and this could indicate that the changes observed could be partly attributable to other factors. Another problem mentioned is a spill-over effect, indicating that people from control areas might have used the facilities, which may have resulted in an underestimation of the effect. Some studies used multiple groups to test robustness of the findings by using different comparisons group or applying different cut-off values to define exposure or outcome. Some studies presented data for city-or nation-wide cycling trends [36,37], or historical time trends [35]. None of the studies included a neutral outcome which was hypothesized to be unaffected by the new infrastructure designed to promote cycling, thereby functioning as a control measure that captures time trends in transportation or physical activity behaviors. Complementing methodologies performed were surveys among residents [24-29, 31-34, 38, 39, 42] or employees [23], intercept surveys among infrastructure users [27], surveys among new residents who moved into the study area [27], and bike counts in the study area [23,29,31,38,42].
Sixteen studies presented data on usage of the infrastructure. Five studies used automatic counting stations or mobile app data to objectively measure cyclist movements for periods between 5 months and 3 years. Others monitored the number of cyclist on selected hours and days using observation techniques. Issues that authors reported that may have partly contributed to the increase in infrastructure usage were tunneling of existing riders to the new infrastructure, other infrastructural changes, traffic conditions, rising cost of car transport, weather conditions and seasonality, demographic changes, social marketing, and changing methodology to collect data. One study indicated that improvements made to the cycling infrastructure could have been a consequence of high cycling levels in specific areas [51].
Some studies presented data for city-or nation-wide cycling trends [29,45,50], or historical time trends [29]. Additional methodologies included surveys among residents [31,38,42,45,50,51] or employees [23], survey among infrastructure users [45][46][47]51], and data collected on cycling behavior [23,29,31,38,42]. Figure 3 shows that studies assessing cycling behavior collected information on population characteristics more often than those assessing usage, thereby potentially providing insights in the population under study and characteristics of those engaging in cycling, and allowing a comparison of intervention and control groups according to baseline characteristics. The items that were most often used by behavioral studies to describe the population at baseline were age (75%), gender (70%) and a measure of socio-economic status (SES) (50%). Only three studies tested for differential effects on cycling by population subgroups. Aldred et al. did not find any differential effects by demographic and socio-economic characteristics [24]. Goodman et al. showed that the change in cycling behavior was larger if there was no car in the household [33]. Parker et al. showed that the increase in cyclists was larger among females than males [53].

Discussion
We identified 31 studies that assessed the effect of infrastructural interventions on cycling in adult populations. All were conducted in urban areas in high-income countries. Most of the evaluations found effects in favor of the intervention, showing that the number of cyclists using the facilities increased, and to a lesser extent that cycling behavior increased. Studies that collected behavioral data more often provided insights in characteristics of people engaging in cycling as compared to studies that reported bike counts. Seven studies reported on physical activity levels, and findings were mixed. Only three studies tested for equity effects, therefore we cannot draw any conclusions as to whether some population subgroups benefitted more than others. We provided data on relative changes that indicates the magnitude of the findings. We acknowledge that in context where only few people use a bike, large relative changes may result in only small population-health benefits. However, due to the large variety in outcomes used we could not further summarize the results.
Our findings suggest that the approach and the specific methods did provide different results. Previous reviews have indicated that this might be the case, but our synthesis of studies exclusively focusing on cycling according to the method used, provides more evidence of this [15,16]. This review built on earlier findings by including studies with various study designs and published in health-related and transportation-related journals. Furthermore, we quantitatively summarized the findings to assess whether the magnitude of the change in cycling differed across study design. In the following three sections we describe the implications of the study design, data collection methods and statistical approaches for the study findings.

Study design and implications for causal inference
An important aspect of study design is the choice of outcome. In this review we categorized outcomes broadly into those that assessed cycling behavior and infrastructure usage. We found that studies on behavioral outcomes found smaller relative changes than studies presenting usage of the infrastructure. If researchers are interested in outcomes relevant for population health, it is recommended that outcomes are framed around the duration and frequency of cycling, as these measures can be directly linked to health impacts. Assessing the proportion of cyclists in a population or the numbers using a route may be a good alternative. If researchers are interested in understanding usage, count data may be used to measure the number of cyclists on the new infrastructure. Other reviews also found that studies measuring outcomes more closely related to the intervention (for example: cycling) were more likely to find intervention effects than studies measuring more general outcomes (for example: physical activity or BMI) [15,54]. Bike count data may support the findings from other evaluations on cycling behavior, but it cannot directly be translated into health gains in the population. Another important design element is whether to include a control population when evaluating built environment changes. The changes in cycling differed for controlled and uncontrolled studies that assessed usage of the infrastructure, but not for cycling behavior. Uncontrolled studies have a stronger basis for causal inference if they can provide evidence that the observed effects do not solely reflect underlying time trends in cycling in the wider area [29,36,37,45,50]. For example, Crane [29] counted the number of bikes passing 2 locations along the new infrastructure. They also presented city-wide cycling trends during the same time period. An increase of 3.7% of cyclist was found along the intervention road, whereas a decrease of 2.0% was seen in the city as a whole. This finding suggests that the number of cyclist increased in the area with the new infrastructure, and this increase does not solely reflect underlying time trends in cycling. To strengthen causal inference, we recommend that studies use controlled designs where possible, and present different measures of cycling and physical activity. Evaluating similar interventions across different sites could give further insights in the variation in the change in these sites if controlled designs are not possible. For example, Lanzendorf [37] evaluated improvements made to the cycling infrastructure in 4 German cities. Cycling frequency on average increased by 27%, which differed between cities from 3 to 38%. They also reported an average increase of cycling frequency by 31% in all big German cities. This approach illustrates that the observed changes in cycling in the intervention sites were comparable to the countrywide increase in cycling. The large range in changes in cycling in the 4 intervention sites also gives insight into the potential range of effects which could be expected in other cities.
The duration of time that populations are exposed to the new infrastructure is another important design element, which can be difficult to control in large infrastructural projects. In studies that assessed changes in cycling behavior we found that the changes were larger when exposure time was longer than 1 year. In studies that assessed the usage of cycling infrastructure, those with shorter exposure time reported larger changes than those with longer exposure time. We noted that some count studies did not count on rainy days [44,53], or only collected data during peak hours [23,29,45,47,51], which may have resulted in larger changes than what could be expected if data was measured throughout by means of automatic counters [46]. Most studies that found changes that were not in favor of the intervention were less than 6 months exposed [23,25,31,32], suggesting that longer follow-up periods may be needed to allow behavioral changes to be detected. Including questions on infrastructure usage within ongoing surveys, or nested within cohorts, may ensure that if the construction work is delayed, there is data available with sufficient exposure time to measure the impact.

Data collection methods and implications for causal inference
Studies were categorized according to whether the focus was on usage or cycling behavior, and large differences in results were found between these two types of outcome. Studies presenting count data of infrastructure found larger changes than studies that assessed behavioral change in the population. Studies counting the number of bikes that passed tracking locations are at risk of assessing the displacement of existing riders to the new infrastructure, and seven studies specifically mentioned this phenomena [43,46,47,[50][51][52][53]. Some studies had offset some of the so-called funneling biases by selecting strategic counting locations where most cyclist pass, or used multiple counting locations to capture cycling behavior in a wider area. Some studies complemented bike count data with intercept surveys among users of the infrastructure, and asked about their previous travel behaviors. These studies showed that the proportion of users that would not have cycled, had the infrastructural improvement not taken place, was much smaller than the increase in counts of cyclists [46,47,51]. Bike count data is useful when aiming to describe at what times of the day, and under which weather conditions, cyclists are using the facility [46].
Another important consideration is choosing between objective or self-reported measures to collect data on cycling behavior. We found that studies using GPS and other objective measures of cycling reported smaller changes than those using self-reported measures. Using GPS and objective assessments of activity could potentially be used to distinguish cycling on and off the new infrastructure [26], and yields estimates of total physical activity levels [26,31]. However, such measures are often applied to a small sample, are limited to a short period of time, and participants who wear such devices might be quite different to the general population. Therefore the findings might be subject to some selection biases. Furthermore, it is possible that the novelty of wearing such devices might lead to changes in physical activity behaviors [31]. Subjective measures of cycling behaviors, such as travel diaries and surveys, provide alternatives when interested in larger groups of people, but many of these have not been validated for cycling specifically.
It is attractive to use already available data when studying so-called "natural experiments" in which researchers lack control over the intervention. Collecting new data to match the timescale of intervention delivery is challenging. A third of the studies evaluating cycling behavior used data that were already collected for a regular monitoring or as part of other studies for the evaluation of other built environment interventions [28, 30, 35-37, 40, 41]. For example, four US studies used census data to estimate changes in cycling after the introduction of new cycling facilities [35,36,40,41]. Other evaluations of natural experiments were planned, allowing to collect specific data to evaluate the intervention of interest in detail. This resulted in powerful analyses in which the method of data collection was tailored to the research questions, but sometimes resulted in limited time being exposed to the intervention. For example, Dill [31] assessed cycling at baseline and after 2years of follow-up. The construction work was significantly delayed, resulting in a short time period between the opening of the facilities and the second assessment of cycling. Moreover, two of the nine projects were not completed within this period. This may have influenced study outcomes. Using existing data may be useful if researchers were not aware of the new intervention, did not obtain funding in time to design a study around the natural experiment, or if large delays in the construction are expected.

Analytical approaches and implications for causal inference
Like other reviews [55], we found that many studies did not perform statistical tests (for cycling behavior: 15% (8/52) and for usage: 67% (14/21)). Smaller changes were found for studies that did not test for statistical significance than those that performed statistical tests. We recommend that studies test for statistical significance which provides more robust evidence that the results are not due to chance, as recommended by guidance for the clear reporting of observational studies [56]. This review included some studies that used more complex analytical methods, such as fixed-effect models [27], interrupted time series [47], or estimated the difference in cycling over time by using a regression analyses that included group, period, and an interaction term between group and period [29,31,35]. Fixed-effects models allow to account for observed time-varying and unobserved timeinvariant characteristics. Perhaps most prominently, individual attitudes towards physical activity may both determine living at a place with opportunities to be physically active and their physical activity behavior. Fixed-effect models allow to control for such unobserved time-invariant confounding, allowing for better causal inference. One study conducted a time series analyses by using GPS tracking information from a mobile phone application, thereby correcting for time trends prior to the intervention [47]. Studies that specified an interaction term between group and period are able to control for observed differences between groups, thereby reducing the risk of bias. The usage of multiple analytical strategies, and the usage of methods that are able to correct for time trends, and measured or unmeasured confounders at the individual or neighborhood level may strengthening the basis of causal inference.

Strengths and limitations
In this review, we focused on the methodological aspects in the evaluation of infrastructural interventions to promote cycling and extracted information on the magnitude of the change in cycling. This allowed us to examine differences in change in cycling according to the methods used. This study was comprehensive by searching multiple electronic databases without date or language restrictions, and we included studies published in public health journals and transportation journals. Controlled and uncontrolled studies were considered for inclusion, and the final selection of studies had a large variety in study designs and methods. We added valuable information by calculating the relative and absolute changes in cycling behavior or usage of the infrastructure, which brought together different outcomes in a simple but interpretable way.
Some limitations also have to be noted. We included only studies that reported on measures of cycling and were unable to examine unreported data on cycling that were included in composite measures of active transportation, walking and cycling, or physical activity. The detail of the information provided in the papers differed between studies, which made it difficult to synthesise and interpret study findings. A pragmatic approach was used to calculate relative changes where possible, but for some studies other approaches may have been better. The evidence presented in the review came from studies that were all conducted in high-income countries. Moreover, only a few studies evaluated the impact on physical activity behaviors and studied equity effects. We focused on structural interventions here, but future research should explore the importance of and interactions with other interventions, such as financial incentives, cycle training, or behavioral interventions, together with the introduction and maintenance of high-quality cycling infrastructure.

Recommendations
Each study design, data collection method and analytical strategy has its advantages and disadvantages. To further strengthen causal inference from observational data, studies are needed that triangulate different methodologies to evaluate the effect of built environment interventions. Studies published in public health journals often report on changes in cycling behavior, while studies published in transportation journals report on usage of cycling infrastructure. Bringing experts from both fields together could result in study designs that better capture the range of impacts of new cycling infrastructure. We are not recommending a specific method or approach, as the research questions of interest should drive the method of data collection. When existing data are used, careful consideration needs to be given to the appropriateness of that data. The reporting of evaluations should adhere to guidelines, such as STROBE which seeks to strengthen the quality of work reported [56]. We suggest, where possible, to combine count data that provides information on how many people are using new infrastructure, with behavioral outcomes of duration and frequency of cycling to ensure estimates of the population health impact. Such estimates could be used in combination with modelling or scenario building tools to estimate the current or future health impacts on outcomes that cannot be observed in studies with limited follow-up. Future studies should focus on the question who are benefiting from the intervention, and identify contexts, barriers and choice constraints to better understanding why cycling changed. This review focused on interventions that changed the cycling infrastructure, but findings and recommendations are likely applicable to other built environment interventions to promote health behaviors.

Conclusion
Introducing cycling facilities in cities is likely to increase the number of cyclist using the facilities, and may result in increases in cycling. Evidence on total physical activity following cycling facilities was mixed. Equity effects were rarely studied. Research questions interest should drive the method of data collection and reporting of evaluations should adhere to published guidelines. Triangulation of methods is warranted to overcome potential issues that evaluators may encounter when evaluating infrastructural interventions within the built environment, and to strengthen the basis of causal inference.