Setting
In this population based prospective birth cohort study, initial sampling was in Northern Finland, which is characterized by long distances to amenities and low population density. High density urban environments are only found in downtown areas of Finland’s largest cities; overall, residential density is 18 inhabitants per km2. Helsinki, the capital and Finland’s biggest city, currently has a population of 643,272 and population density of 3002 inhabitants per km2. Oulu, the biggest city in Northern Finland and the country’s fifth largest city, has a population of 201,810 and a population density of 68 inhabitants per km2 [16]. Among cohort members, most migration has focused on the Helsinki metropolitan area in Southern Finland. At both time points, about a fifth of the sample lived in Oulu. The proportion of participants living in Helsinki was 9% at 31 years, and 5% at 46 years.
Participants
The study population, Northern Finland Birth Cohort 1966, comprised all individuals born in 1966 (N = 12,058) from the two northernmost provinces of Finland. The cohort has been prospectively monitored by means of interviews, postal questionnaires and clinical measurements in follow-ups at the age of 1, 14, 31 and 46 years. The study was approved by the Ethical Committee of the Northern Ostrobothnia Hospital District. For the present study, we included data from 5974 subjects who participated in the follow-ups at 31 years and 46 years, which were conducted in 1997 and 2012, respectively.
Exposure variables
The main explanatory variable was objectively assessed neighborhood DMA. For each participant in the study population, residential coordinates were obtained from the Finnish Population Register Centre [17], encompassing their lifetime residential relocation history in Finland.
A Geographic Information System (ArcGIS 10.3) was used to assess neighborhood DMA, which was derived from validated walkability and bikeability measures that describe the conduciveness of the built environment characteristics for walking and cycling [18,19,20,21,22]. Neighborhood DMA was calculated within a 1 km circular buffer of every residential location for each participant for every year from 31 to 46 years of age (16 time points) by combining population density, number of diverse destinations and intersection density. For this follow-up period, accurate time-varying information on the community structure was available from the Finnish Community Structure data base, which is based on 250 * 250-m grids [23]. Hence, we were also able to assess changes in the built environment also for participants who did not change residential location during the follow-up. When linking residential coordinates to geographical data, we used the closest available year for which data were available, with a maximum difference of two years.
Population density was based on the sum of people living within the buffer. Similarly, number of destinations was based on the sum of destinations for retail (shops, market halls, department stores, commercial centers), recreation (restaurants, theaters, cinemas, sports facilities) and office and community institutions (libraries, museums, churches, health care, schools) [23]. Street network data were based on Digiroad (Finnish National Road and Street Database) from the year 2012 [24]. We excluded roads where walking and cycling were prohibited and included only intersections with three or more legs. Then we standardized these variables by calculating z-scores by subtracting the variable mean of the variable and dividing the centered value by the variable standard deviation. Z-scores indicate how many standard deviations the value is away from the mean. For the final DMA score, we calculated the standardized variables together.
Outcome variables
Self-reported regular walking and cycling were both used as the main outcome variables, and objectively measured physical activity at the age of 46 was used as a secondary outcome. Walking and cycling were assessed by identical questionnaires at 31 years and 46 years, based on the following question: “How often are you engaged in the following kinds of physical activities? Choose the alternative that best represents the average situation during the previous year.” Response alternatives for walking and cycling were assigned to a six-point Likert scale: 1) not at all, 2) once a month or less, 3) two to three times a month, 4) once a week, 5) two to three times a week, and 6) four times a week or more. For statistical analysis we coded walking and cycling as binary variables, defining regularity as four-times a week or more. Stratification was based on current recommendations for physical activity for adults (at least 150 min of moderate intensity aerobic physical activity throughout the week) [25, 26].
At 46 years, participants´ physical activity was objectively assessed using a waterproof wrist-worn activity monitor (Polar Active, Polar Electro, Finland). Polar Active provides a daily step count and a measure of physical activity based on estimated metabolic equivalent (MET) values every 30 s, using baseline information about the user’s height, weight, age, and sex. Physical activity was stratified into five levels: very light (1–2 MET); light (2–3.5 MET); moderate (3.5–5 MET); vigorous (5–8 MET); and very vigorous (≥8 MET) based on manufacturer thresholds [27] and average minutes per day were calculated for each activity level. For the purposes of analysis, we combined moderate, vigorous and very vigorous physical activity. Validation studies confirm that the monitor correlates well (R2 = 0.74) with a doubly labeled water technique assessing energy expenditure during exercise training [28]. The participants (N = 3786) were asked to wear the activity monitor on their non-dominant hand 24 h a day for 14 days, and only participants with at least four valid measurement days (600 min/day of monitoring time during waking hours) were included in the analysis.
Confounding variables
Sociodemographic variables including sex (male, female), education (higher education, vocational/secondary/basic education), children under 18 years living at home (yes, no), marital status (married/de facto relationship, single/divorced/widowed) were assessed using identical questionnaires at both time points, and these were treated as confounding variables.
Statistical methods
R version 3.5.0 [29] was used for statistical analyses. We performed sequence analysis using TraMineR [30] to visualize residential relocation trajectories based on neighborhood DMA during the follow-up, and to cluster participants according to those trajectories. The analysis involved defining sequences, measuring dissimilarities between them and categorizing sequential patterns into groups.
To begin, we categorized the DMA measure into quintiles and assigned these to each follow-up year from 1997 to 2012 for each subject. For any particular year, we selected the residential location where the subject had lived for the longest time during that year. We used the Hamming distance [30, 31] to evaluate distance between sequences and to conduct sequence dissimilarity matrices, which were then grouped using Fastcluster [32] with the Ward agglomerative hierarchical clustering method. Because of the large sample size and in order to identify the most relevant trajectories, the study population was stratified into ten clusters according to similarity of residential relocation history. Fisher’s exact test with odds ratio was used to test whether the number of study participants who started regular walking or cycling during the follow-up differed across clusters.
Generalized linear mixed models were conducted with lme4 [33] to analyze the statistical significance of the longitudinal association between neighborhood DMA and regular walking and cycling. In separate models, we assessed associations between neighborhood DMA and its components, and regular walking and cycling, which were coded as binary variables. DMA scores from 31 years and 46 years were used as a continuous variable. We used subject as the random intercept and binomial distribution with a logit link function for modeling. Over- or underdispersion was not an issue because of the binary dataset. Sociodemographic variables were selected as potential confounding factors because these have previously been associated with physical activity and residential location, and may account for residential self-selection bias [34,35,36,37]. Model fitting was based on maximum likelihood, and we used the Laplace approximation to estimate fixed-effect model parameters [38]. For statistical inference, we used the Wald chi2 test to test the significance of fixed effects. The effect sizes of predictor variables are presented with odd ratios and 95% confidence intervals.
Because the number of all destinations is more a measure of density rather than diversity, we performed sensitivity analyses by conducting separate generalized linear mixed models for both number of utilitarian destinations and recreational destinations as predictors of regular walking and cycling. Independent samples t-testing was used to compare objectively measured physical activity among those who walked or cycled regularly at 46 years of age and those who did not.