Identifying risk profiles for childhood obesity using recursive partitioning based on individual, familial, and neighborhood environment factors

Background Few studies consider how risk factors within multiple levels of influence operate synergistically to determine childhood obesity. We used recursive partitioning analysis to identify unique combinations of individual, familial, and neighborhood factors that best predict obesity in children, and tested whether these predict 2-year changes in body mass index (BMI). Methods Data were collected in 2005–2008 and in 2008–2011 for 512 Quebec youth (8–10 years at baseline) with a history of parental obesity (QUALITY study). CDC age- and sex-specific BMI percentiles were computed and children were considered obese if their BMI was ≥95th percentile. Individual (physical activity and sugar-sweetened beverage intake), familial (household socioeconomic status and measures of parental obesity including both BMI and waist circumference), and neighborhood (disadvantage, prestige, and presence of parks, convenience stores, and fast food restaurants) factors were examined. Recursive partitioning, a method that generates a classification tree predicting obesity based on combined exposure to a series of variables, was used. Associations between resulting varying risk group membership and BMI percentile at baseline and 2-year follow up were examined using linear regression. Results Recursive partitioning yielded 7 subgroups with a prevalence of obesity equal to 8%, 11%, 26%, 28%, 41%, 60%, and 63%, respectively. The 2 highest risk subgroups comprised i) children not meeting physical activity guidelines, with at least one BMI-defined obese parent and 2 abdominally obese parents, living in disadvantaged neighborhoods without parks and, ii) children with these characteristics, except with access to ≥1 park and with access to ≥1 convenience store. Group membership was strongly associated with BMI at baseline, but did not systematically predict change in BMI. Conclusion Findings support the notion that obesity is predicted by multiple factors in different settings and provide some indications of potentially obesogenic environments. Alternate group definitions as well as longer duration of follow up should be investigated to predict change in obesity. Electronic supplementary material The online version of this article (doi:10.1186/s12966-015-0175-7) contains supplementary material, which is available to authorized users.


Background
Childhood obesity has reached epidemic proportions worldwide [1] and its health consequences are considerable [2]. Obesity is a complex condition in which a myriad of risk factors interact within and between several levels of influence [3]. Social ecological frameworks posit that childhood obesity is influenced by energy intake and expenditure patterns, which are embedded within the familial and wider community contexts [4][5][6]. An understanding of the multiple influences on obesity, including within individual, familial, and neighborhood levels, will improve population efforts to address childhood obesity. For example, at the individual level, regular intake of sugar-sweetened beverages [7] and physical inactivity [8] have been associated with childhood obesity. Similarly, through shared genetics and lifestyles, parental obesity has been identified as a risk factor for childhood obesity [4,9,10]. Within wider community contexts, neighborhood parks, sports and recreational facilities, and the presence of nearby convenience stores and fast food restaurants have been associated with childhood obesity, albeit inconsistently [6,[11][12][13][14]. Neighborhood disadvantage has been more consistently associated with childhood obesity [15,16]. However, it remains unclear how factors within these different levels of influence interact to determine obesity.
Individual, familial, and neighborhood factors may have synergistic effects on childhood obesity [17,18]. To test hypotheses regarding synergistic effects (i.e., effect modification), interaction terms in regression models are typically used [18]. However, this approach is not ideal for modeling more complicated nonlinear associations. An alternative non-parametric method consists of using recursive partitioning analysis, which has gained popularity as a means of multivariate data exploration in various fields [19]. Recursive partitioning produces a classification tree following a series of binary splits dividing children into higher-and lower-risk subgroups for a given outcome based on a number of predictor variables [20]. In addition to its intuitive appeal, recursive partitioning methods are particularly useful to examine higher order interactions, for example between multiple individual and neighborhood characteristics [21]. Therefore, the primary objective of this study is to determine optimal combinations of individual, familial, and neighborhood environment characteristics that best predict obesity among children using recursive partitioning analysis. A secondary objective is to examine whether the resulting classification is associated with 2-year changes in body mass index (BMI) percentile.

Subjects
Participants were drawn from QUALITY (Quebec Adipose and Lifestyle Investigation in Youth), an ongoing longitudinal investigation of the natural history of obesity and cardiovascular risk in Quebec youth. At baseline, 630 participants aged 8 to 10 years were recruited using schoolbased sampling (2005)(2006)(2007)(2008)(2009)(2010). Eligibility criteria, verified over the phone, required participating children to have at least 1 obese biological parent based on parent-reported measurements of weight, height, and waist circumference (i.e., BMI ≥30 kg/m 2 and/or waist circumference >102 cm in men and >88 cm in women). At the baseline clinic visit, parental anthropometrics were measured. Thirty-five children had no obese parents based on measured BMI or waist circumference, likely due to self-report measurement error or to weight loss between the initial contact and the baseline visit; these families were nevertheless retained since inclusion criteria were based a priori on self-report and since children still had at least 1 borderline obese parent. A 2-year follow-up assessment was completed in 2008-2011. Characteristics of neighborhood environments were assessed at baseline for participants residing in the Montreal Metropolitan Area (n = 512) to which this study is restricted. The ethics review boards of CHU Sainte-Justine and Laval University approved the study protocol. A detailed description of the study design and methods is available elsewhere [22].

Measurement of individual characteristics
Child anthropometrics were measured at baseline and follow-up using standardized protocols [22]. Centers for Disease Control and Prevention age-and sex-specific BMI percentiles were computed. Children were categorized as obese if their BMI was ≥95 th percentile. Pubertal development stage was assessed by a nurse using the 5-stage Tanner scales [23,24], and was dichotomized as prepubertal (Tanner 1) Vs. puberty initiated (Tanner >1) for both baseline and follow-up.
Intake of sugar-sweetened beverages was measured using mean values of 3 24-hour diet recalls conducted by trained dieticians on non-consecutive days including 1 weekend day [25]. Except in unusual circumstances, the recalls were collected within a 4-week period following the baseline clinic visit. Diet recall interviews were done by telephone with the child and then confirmed with the parent who prepared the meals. Reported foods were entered into CANDAT (London, Canada) and converted to nutrients using the 2007b Canadian Nutrient File [26]. Intake of sugar-sweetened beverages was computed as the mean daily mL of soft drinks and other sugar-sweetened drinks, excluding juices made from real fruits. Given a substantial positive skewness in its distribution, the variable was dichotomised to >50 mL/day (approximately 1 soft drink can per week) Vs. less.
Participants' physical activity (PA) was measured using a uniaxial activity monitor (Actigraph LS 7164 activity monitor, Actigraph) for 7 days during the week following the baseline clinic visit. A minimum of 4 days with ≥10 h of wear time was required for data to be retained [27]. The Actigraph cut-offs proposed by Evenson et al. were used to define moderate to vigorous PA (MVPA) [28]. Based on Canadian PA guidelines, children achieving a mean of at least 60 minutes of MVPA per valid day were classified as active.

Measurement of familial characteristics
At baseline, parents' weight, height, and waist circumference were measured using standardized protocols [22]. Two parental obesity variables were examined: BMIdefined obesity (BMI ≥30 kg/m 2 ) and abdominal obesity (waist circumferences >88 cm for mothers and >102 cm for fathers) [29]. For both parental obesity variables, children were categorized as having none, 1 or 2 obese parents. Highest parental educational attainment and total annual household income adjusted for the number of people living in the household were obtained from parent-completed questionnaires during clinic visits.

Measurement of neighborhood environment characteristics
Neighborhood environments were characterized using a geographic information system (GIS) for the study area. Canadian Census data from 2006 were used to obtain the following measures: % residents with a university degree, average value of owner occupied residences, % households living below Statistics Canada's low income cut-offs [30], % single parent families, % unemployment, % who have moved in the past year and % owner occupied residences. For each measure, population-weighted proportions or averages of Census dissemination areas overlapping 500 m network buffers centered on the child's residential address were computed. These variables were then reduced to 2 components using principal components analysis, namely neighborhood prestige (university degree and housing value) and neighborhood disadvantage (remaining Census variables described above), and then categorized into tertiles (see Additional file 1: Table S1) [31].
The GIS also provided information on food establishments located within 500 m network buffers around the residence based on data from an exhaustive list of businesses and services located in the region in May 2005 acquired from Tamec Inc. A validation study of food establishments from this list, verified by onsite field visits showed good agreement (0.77), sensitivity (0.84), and positive predictive value (0.90) [32]. All businesses were geocoded using DMTI GeoPinPoint, version 2007.3. In this study we focused on access to convenience stores and fast food restaurants based on evidence of associations with unhealthful diets [33]. Children were categorised as living within ≥1 convenience store (Vs. not) and within ≥1 fast food restaurant (Vs. not) located in 500 m network buffers centered on their residence given our hypothesis that having proximal access to any such amenity relative to none is sufficient to influence access.
Lastly, the presence of parks was computed using land use information from CanMap (DMTI Spatial Inc.). Information from GIS identified parks was subsequently validated by in-person neighborhood assessments during which independent pairs of trained observers walked every street within 500 m network buffers centered on participants' residences. Parks were defined as public open spaces in which children could engage in active play. Participants were classified as having or not ≥1 park within 500 m network buffers centered on their residence. All neighborhood environment measurements were operationalized for 500 m network buffers given that children and youth typically have smaller activity spaces than adults and for the sake of consistency in buffer size given that observer-validated park counts were available only for 500 m network buffers.

Statistical analysis
Recursive partitioning was used to identify subgroups of participants that varied in terms of obesity using the RPART routine available in the R statistical environment [34]. This non-parametric regression method produces a classification tree following a series of non-sequential top-down binary splits. The tree-building process starts by considering a set of predictor variables and selects the variable that produces 2 subsets of participants with the greatest purity (i.e., where participants within each subset are most alike in terms of the outcome). Two factors are considered when splitting a node into its daughter nodes: the goodness of the split and the amount of impurity in the daughter nodes [35]. The splitting process is repeated until further partitioning is no longer possible and terminal nodes have been reached. Because the resulting tree is typically large, difficult to interpret, and may over-fit to the data, pruning techniques are used to reduce the size of the original tree by eliminating selected branches from later splits. This is done using cost-complexity measures and cross-validations to assess the predictive performance of several reduced subtrees. The final classification tree is a subtree of the original tree that is most predictive of the outcome and has the lowest cross-validated error [19].
Observations that have missing values on a predictor variable are not discarded from the analysis. Instead, these observations are ignored for the computation of the impurity index when that variable is being considered as a splitting variable, but they are included in subsequent computations. To do so, a surrogate variable that best predicts the missing splitting values is used to determine the classification of observations with missing values to either daughter node (see Strobl et al. for details [19]).
In this study, 9 variables were submitted to the recursive partitioning process, based on evidence of associations with childhood obesity: 2 individual variables (sugarsweetened beverage intake, meeting PA guidelines), 4 familial variables (number of BMI-defined obese parents, number of parents with abdominal obesity, parental education, household income), and 5 neighborhood environment characteristics (disadvantage, prestige, and presence of ≥1 park, fast food restaurant, and convenience store). The Gini index was used as an indicator of node purity which reaches its minimum for perfectly pure nodes (the desired result) and its maximum when cases are distributed evenly between classes at a given node [19]. A 10fold cross-validation technique was used to prune the tree; the best tree was based on the "1 -SE" rule in which the cross-validated error estimate is no more than 1 standard error (SE) larger than the best tree [19,36]. This resulted in classification trees with 7 terminal nodes ( Figure 1).
Multivariable linear regression models were subsequently used to examine associations between the categorical variable that represents the recursive partitioning subgroups (terminal nodes) and BMI percentile while controlling for age, sex, puberty, and parental education. The lowest risk subgroup was the reference category; the remaining subgroups were identified using 6 indicator variables. Finally, associations between subgroup membership and BMI percentile at follow-up were examined while adjusting for BMI percentile at baseline. These analyses were conducted with SAS version 9.3 (Cary, North-Carolina). Although a school-based sampling was used in QUALITY, clustering of participants in schools did not significantly influence estimates for associations (see Additional file 1: Tables S2 and S3).

Results
Characteristics of study participants are provided in Table 1. Both at baseline and at follow-up, 23% of participants were obese (117/512 and 106/462, respectively). Thirty four percent of obese participants had initiated puberty at baseline compared to 21% among non-obese participants. At follow-up, 77% of obese and 66% of non-obese participants had initiated puberty. Overall, more than half consumed >50 mL of sugar-sweetened beverage per day and obese participants were less likely to engage in ≥60 minutes of MVPA daily. Familial characteristics varied according to obesity status in the expected    direction with a greater proportion of obese children in lower income/education households, and in households with 2 obese parents (defined using BMI or waist circumference). Obese children more often lived in neighborhoods characterised by high disadvantage and by the proximity to ≥1 convenience stores.
The classification tree showed sequentially increasing prevalence of obesity in its 7 terminal nodes ( Figure 1). The lowest risk subgroup, Group 1 (i.e., reference), consisted of 132 participants with no BMI-defined obese parent (8% obese). Group 2 consisted of 97 participants with ≥1 BMI-defined obese parent but who meet PA guidelines (11% obese). Group 3 consisted of 163 participants with ≥1 BMI-defined obese parent, not meeting PA guidelines, and with ≤1 abdominally obese parent (26% obese). Group 4 consisted of 39 participants with ≥1 BMIdefined obese parent, not meeting PA guidelines, with 2 abdominally obese parents, and living in a low disadvantage neighborhood (28% obese). Group 5 consisted of 37 participants with ≥1 BMI-defined obese parent, not meeting PA guidelines, with 2 abdominally obese parents, living in an average to high disadvantage neighborhood with ≥1 park and no convenience store (41% obese). Group 6 consisted of 25 participants with ≥1 BMI-defined obese parent, not meeting PA guidelines, with 2 abdominally obese parents, living in an average to high disadvantage neighborhood with ≥1 park but also to ≥1 convenience store (60% obese). Lastly, Group 7 consisted of 19 participants with ≥1 BMI-defined obese parent, not meeting PA guidelines, with 2 abdominally obese parents, living in an average to high disadvantage neighborhood with no access to parks or to convenience stores (63% obese).
Recursive partitioning successfully generated subgroups that differed in obesity status. After adjusting for child's age, sex, pubertal development stage, and parental education at baseline, children from Groups 2 to 7 had sequentially increasing BMI percentiles, varying from 12 Table 2).
Follow-up data were available for 462 participants. Of the 50 participants lost to follow-up, almost half (46% n = 23) were lost from Group 3, of which 39% (n = 9) were obese  at baseline. Changes in BMI percentile between baseline and follow-up are shown in Tables 3 and 4. Only Group 3 (≥1 BMI-defined obese parent, not meeting PA guidelines, and with ≤1 abdominally obese parent) showed an increase in BMI percentile after a 2-year follow-up in comparison to Group 1 [B = 3.6 (95% CI: 0.5; 6.6)] (Table 3).

Discussion
Recursive partitioning, a novel method in the study of neighborhoods and health, was used to examine how specific risk factors jointly influence obesity among children. Risk factors from different levels of influence based on a social ecological framework were considered. In this sample characterized by an overall high prevalence of familial obesity, successively higher BMI percentiles were found in children who cumulated individual, familial, and neighborhood environment risk factors. However, limited evidence for associations with 2-year changes in BMI percentile was found. Classification trees are often unstable in the face of minor changes in the sample; using recursive partitioning in a different study sample is likely to yield a different classification tree. The relatively small data set used in this study further adds to the instability of the classification tree and yielded imprecise measures of associations, notably in the higher risk subgroups (e.g., n = 19 for group 7). Although findings may be difficult to reproduce and should be interpreted with caution, recursive partitioning allowed us to identify potentially highly obesogenic environments in the QUALITY study. Measures of associations reported in this study may be generalizable to Caucasian children with a parental history of obesity.
Recursive partitioning is a valuable data exploration method in the study of neighborhoods and health. It allows for the detection of higher order interactions within the data which would be challenging to examine using Generalized Linear Models. Other strengths of this study include the use of objective measures of obesity in children and both biological parents, PA, and  neighborhood environment indicators, and the use of neighborhood definitions centered on each participant's residential address. It is well recognised in the literature that obesity is influenced by multiple risk factors stemming from multiple levels of influence, yet previous studies examined a limited range of risk factors simultaneously [4]. Recursive partitioning provides a unique method of analysis to generate hypotheses on how these multiple risk factors may jointly influence childhood obesity. In this analysis, individual and familial risk factors were selected first whereas neighborhood environment variables only emerged in latter branches of the classification tree. This may reflect strong associations between individual-level variables and obesity measured at the individual level but does not eschew the importance of contextual-level variables and obesity measured at both the individual and population levels [37,38]. Since obesity is likely the result of shared genetics, lifestyle and environmental risk factors, and since these relationships are difficult to disentangle in observational studies, contextual influences may be underestimated.
With respect to neighborhood characteristics, findings are consistent with the numerous studies that report more obesity among residents of socioeconomically disadvantaged neighborhoods [15]. At equal individual and familial risk and without consideration of subsequent splits, in this sample the prevalence of obesity was almost twice as high among children living in socioeconomically disadvantaged neighborhoods (52%) compared to those living in low disadvantage neighborhoods (28%). Among children living in socioeconomically disadvantaged neighborhoods, elements of the built and food environment, namely access to parks and convenience stores, further determined obesity. Findings suggest that neighborhood environment characteristics previously associated with childhood obesity (i.e., access to parks and convenience stores [6,11,13,39]) may be particularly influential for children who are already most vulnerable due to individual (i.e., physical inactivity) and familial risk factors (i.e., parental obesity).
Convincing evidence for associations between the classification tree subgroups and 2-year changes in BMI percentile was not found. Only children with ≥1 BMI-defined obese parents, not meeting PA guidelines, and with ≤1 abdominally obese parents showed an increase in BMI percentile at follow-up. This was the subgroup with the largest number of participants. Although other subgroups had coefficients of change of similar magnitude (i.e., Group 5), detection of associations may have been limited by the relatively small sample size. Selection bias may have resulted from the loss to follow-up of participants based on specific profiles of risk factors and on obesity. The duration of follow-up may have been insufficient to detect an effect on changes in BMI which typically occur slowly over time. Alternatively, determinants of obesity in crosssectional associations may be different from those of obesity development which could explain why some cross-sectional findings are not reproduced in longitudinal analyses [40].

Conclusion
Recursive partitioning allowed us to classify participants into qualitatively distinct subgroups based on a series of modifiable individual, familial and neighborhood environment risk factors. This provides some indications of potentially obesogenic environments and points to the "when, where, and for whom certain environmental attributes are most influential" on childhood obesity (p.101) [17]. Future studies in larger samples and with longer durations of follow-up are needed to better understand how different combinations of risk factors jointly predict obesity. Findings contribute to the growing body of evidence that supports the need for multi-level and multi-setting population approaches to obesity prevention [41]. In particular, interventions aimed at modifying neighborhood environments may be most beneficial for children who are already the most vulnerable due to individual and familial risk factors.