The IPEN Adolescent study was an observational, multi-country, cross-sectional study with purposive sampling, including 18 cities/regions (hereafter, sites) in 15 geographically, economically and culturally diverse countries across six continents: Australia (AUS; Melbourne), Bangladesh (BGD; Dhaka), Belgium (BEL; Ghent), Brazil (BRA; Curitiba), Czech Republic (CZE; Olomouc, Hradec Králové), Denmark (DNK; Odense), Hong Kong SAR (CHN; Hong Kong), India (IND; Chennai), Israel (ISR; Haifa), Malaysia (MYS; Kuala Lumpur), New Zealand (NZL; Auckland, Wellington), Nigeria (NGA; Gombe), Portugal (PRT; Porto region), Spain (ESP; Valencia) and USA (Baltimore and Seattle regions). Data were collected between 2009 and 2016. Detailed information regarding sites, protocol, design, and measures is presented in a protocol paper . In short, recruited adolescent participants were between 11 and 19 years, along with one parent or legal guardian (exception New Zealand: only adolescents), who were living in neighbourhoods (i.e., compilation of administrative units) chosen to maximize variance in neighbourhood income and walkability. For categorisation on walkability, a Geographic Information Systems (GIS)-based walkability index was used that was a composite of residential density, intersection density and land use mix [14, 15]. Malaysia, Nigeria, and India relied on local knowledge to identify diverse neighbourhoods. In most countries, the socio-economic status (SES) of neighbourhoods was classified as low or high based on city/region-specific demographic data. Neighbourhoods were stratified into high walkability-high SES, high walkability-low SES, low walkability-high SES and low walkability-low SES quadrants for balancing representation of neighbourhood types during participant recruitment.
Two strategies were used: (1) systematic selection of potential participants living at an address within the preselected neighbourhoods (Brazil, Israel, USA), and (2) recruiting participants from preselected schools located within the four quadrants (10 countries). Belgium and India combined both strategies. Recruitment within schools was conducted using random sampling, classroom or year-level recruitment. After recruitment, the adolescents’ residential address was assigned to the appropriate quadrant code of the neighbourhood where they lived. All countries conducted recruitment in person, except in the USA telephone and mail methods were used. In total, 6950 adolescents participated in the study. Mean participation rate, based on the countries that provided this information, was 48.4% (SD 23.6%). The lowest participation rate was reported in India (11%) and New Zealand (12.8%); participation was highest in Czech Republic (89.7%). Additional information like recruitment dates, site-specific participation rates, school schedules, contact mode and incentives in each country are reported elsewhere . All studies in each country were approved by their Institution’s Ethics Committees, and participants and their legal guardian provided informed assent/consent.
Weight and height were self-reported in eight countries, and measured by research assistants in seven countries . Sex was self-reported and decimal age in years calculated from birth date to date of measurement. To have wider international representation of sex- and age-adjusted BMI standards, the LMS Growth software program was used , applying the 2007 WHO Child Growth Reference  and the International Obesity Task Force (IOTF) cut points [18, 19]. The program converts physical assessments to age- and sex-adjusted BMI standard deviation (SD) scores (based on the 2007 WHO Child Growth reference) and IOTF grades. The IOTF cut-offs classify BMI in children aged 2–18 years as thin (3 grades), normal weight, overweight, or obese. The six possible IOTF grades reflect the adjusted BMI values projected to adult BMI cut-offs at age 18: thinness grade -3 (BMI < 16), thinness grade -2 (BMI 16 to < 17), thinness grade -1 (BMI 17 to < 18.5), normal weight grade 0 (BMI 18.5 to < 25), overweight grade + 1 (BMI 25 to < 30) or obese grade + 2 (BMI 30 +). For this study, IOTF grades were reclassified into thin/normal versus overweight/obese. Finally, the Centers for Disease Control (CDC) BMI-SD scores  were also considered in sensitivity analyses to examine whether using WHO BMI-SD scores versus CDC BMI-SD scores produced different results (see Appendices 1, 2, 3 and 4).
Accelerometer-assessed MVPA and ST
Adolescents (all or a subsample, depending on study site; n = 5215) were asked to wear an ActiGraph accelerometer on the right hip for at least seven days during waking hours when not swimming or bathing. Due to varying availability across study sites, four ActiGraph models were used (7164, GT1M, GT3X and GT3X +). To standardize screening and scoring procedures, accelerometer data from all countries were sent to the study’s Coordinating Center site. Trained researchers at the coordinating center screened all data using MeterPlus v.5.0 to ensure comparable data processing and scoring methods across all sites. Screening procedures were checked for devices that had malfunctioned, flagged non-wearing time for exclusion, and marked valid wearing days for scoring. More details about IPEN Adolescent accelerometer scoring protocols can be found on the IPEN website at http://ipenproject.org/methods_accelerometers.html.
All accelerometer vertical axis data were collected with (or converted to) a 30-s epoch, which was the shortest length that could be standardized across all study sites. While a 60-s epoch has often been used in both adult and youth studies, shorter epochs appear to record more accurately the intermittent, short bursts of physical activity common in young people . Non-wear time was defined as 60 + minutes of consecutive zero counts, which is an interval that very accurately differentiates sedentary behavior from non-wear time in adolescents . A valid wearing day consisted of at least 8 h of wear time during waking hours from 6AM to midnight. Only participants with at least 4 valid wearing days were included in the analyses (n = 4852). The wearing criteria of at least 8 h per day to define a valid day and at least 4 valid wearing days for inclusion in analyses are commonly used in adolescent accelerometer studies .
Evenson cut-points for MVPA and ST (≤ 100 counts per minute) were applied to compute average duration per day (minutes/day) across all valid wearing days . In addition, MVPA and ST durations ‘during school’ on school days and during all ‘non-school’ periods (i.e., before and after school on school days plus all valid wearing hours on non-school days) were extracted. Self-reported school start and end times were used in most countries to determine school days and in-school times. These data were not available in the USA and 08:15 AM to 02:15 PM was used as an estimate of the school day on weekdays for the USA .
Fourteen countries used an Actigraph GT model (GT1M, GT3X, or GT3X +), and one country (USA) primarily used the older generation 7164 model. For sites using GT models, protocols specified that the Low Frequency Extension (LFE) be enabled because it produces comparability between data collected with the older 7164 model and the newer generation GT models . Twelve countries using the GT models always had the LFE enabled when collecting accelerometer data (total of 4482 cases). However, two countries had some wearings that used a GT model with the normal filter enabled, which made it less sensitive to lower-intensity activity (90 cases in the USA and 154 cases in India). One country (Denmark) used the normal filter for all accelerometer wearings (126 cases). To account for potential effects of using less sensitive GT models with the normal filter enabled , a variable denoting comparability of accelerometer models was created (0 = non-comparable; 1 = comparable) and used in sensitivity analyses. The 7164 and GT models with LFE were considered comparable (n = 4482 cases); GT models with the normal filter used (n = 370) were considered non-comparable to the 7164 and GT models with LFE enabled .
Socio-demographic covariates and study design measures
Sex, age and highest educational attainment in the household were included as covariates in all statistical models. Study design variables adjusted for included site (city/region) and the dichotomous (low versus high) indicators of within-site administrative-unit walkability and SES. To adjust for accelerometer-related differences across participants, number of valid days of accelerometer wear time, average accelerometer wear time/day, and accelerometer comparability (yes vs. no) were included in analyses. Recruitment-related clustering within residential census units and school attended was adjusted for by including administrative codes for neighbourhoods and schools as random effects in analyses.
Descriptive statistics were computed for all relevant variables, by site and for the whole sample. Missing data for at least one variable occurred in 9% of participants, with a minimum of 0% in Hong Kong (CHN) and maximum of 56.7% in Melbourne (AUS). The presence of missing data on specific variables was related to other study variables, i.e., data were at least missing at random (MAR) rather than completely missing at random . Specifically, in the analytical sample (n = 4852; adolescents with valid accelerometer data), six variables were associated with having missing values on one or more variables examined in this study. Older adolescents (OR = 1.167; 95%CI: 1.012, 1.236; p < 0.001), females (OR = 1.258; 95%CI: 1.037, 1.526; p = 0.020) and those with more MVPA minutes per valid day (OR = 1.011; 95%CI: 1.007, 1.015; p < 0.001) were more likely to have missing values. Adolescents with more valid days of accelerometer wear (OR = 0.827; 95%CI: 0.768, 0.892; p < 0.001), more total wear time per valid day (OR = 0.997; 95%CI: 0.996, 0.998; p < 0.001) and more ST per day (OR = 0.998 95%CI: 0.996, 0.999; p = 0.001) were less likely to have missing values. As analyses based on complete data only when missing data are MAR can yield biased results, while analyses based on properly-conducted multiple imputations do not , ten imputed datasets were created for the main regression analyses . We also conducted the same analyses on cases with complete data (n = 4384) for sensitivity analysis purposes. Multiple imputations were performed using chained equations (MICE) accounting for within-city administrative-unit- and school-level clustering effects arising from the two-stage stratified sampling strategy employed in each study site . The ten imputed datasets were created in R (R Development Core Team, 2020) using the package ‘mice’ and following procedures outlined by van Buuren .
For the first aim, we used generalized additive mixed models (GAMMs) with random intercepts at the within-city administrative-unit and school level [11, 27]. As BMI variables were continuous and approximately normally distributed, they were modelled using GAMMs with Gaussian variance and identity link functions. As BMI status (dichotomized IOTF grades) was a binary variable it was modelled using GAMMs with binomial variance and logit link functions. The reported antilogarithms of the regression coefficient estimates of the binomial GAMMs represent odds ratios of inclusion in the overweight/obese IOTF category.
Main-effect sets of GAMMs estimated relationships of total MVPA and ST (Model 1), as well as school-based MVPA and ST and non-school-based MVPA and ST (Model 2), with the outcome variables, adjusting for adolescent sex, age, site, highest education level, within-city/region administrative-unit-level walkability and SES, accelerometer comparability, number of valid days of accelerometer wear time and average accelerometer wear time per day. There was no collinearity between the explanatory variables included in the GAMMs (maximum absolute correlation = 0.30). Curvilinear relationships of MVPA and ST variables with BMI outcomes were estimated using non-parametric smooth terms in GAMMs, which were modelled using thin-plate splines . Smooth terms failing to provide evidence of a curvilinear relationship (an Akaike’s Information Criterion [AIC] value 10 + units smaller than the linear model) were replaced by simpler linear terms.
Separate GAMMs were run to estimate MVPA and ST variables by sex, site (second study aim) and accelerometer comparability (sensitivity analyses) interaction effects on BMI outcomes. The significance of interaction effects of site (each consisting of 17 PA or ST variable-by-site interaction terms) was evaluated by comparing AIC values of models with and without interaction effects, whereby the model with a ≥ 10-unit smaller AIC was preferred . The significance of the interaction effects of sex and accelerometer comparability (defined by a single interaction term) were determined using the Wald test. Significant interaction effects were probed by computing the associations of MVPA and ST variables with BMI outcomes at different values of the moderator. Sensitivity analyses for the WHO BMI-SD score as outcome were undertaken by running the same GAMMs with the CDC BMI-SD scores as outcome (Appendices 1, 2, 3 and 4). All analyses were conducted on the imputed data sets (primary analyses reported in the manuscript) and cases with complete data (sensitivity analyses, Appendices 1, 2, 3 and 4). All analyses were conducted in R (R Development Core Team, 2020) using the packages ‘car’ , ‘mgcv’ , ‘gmodels’ , and ‘mice’ .