Categorisation of built environment characteristics: the trouble with tertiles

Lamb, Karen E; White, Simon R

doi:10.1186/s12966-015-0181-9

International Journal of Behavioral Nutrition and Physical Activity

Table 1 Comparison of modelling approaches for predicting BMI from the count of supermarkets within 5 km

From: Categorisation of built environment characteristics: the trouble with tertiles

Model	Predictor	Coefficient	(S.E.) ^(a)	p-value	AIC ^(f)
Linear Model	Intercept	26.43	(0.35)	<0.001	9139.73
	Count of supermarkets	−0.10	(0.03)	<0.001
Fractional	Intercept	26.53	(0.37)	<0.001	9139.73
Polynomial^(b)	(Count of supermarkets + 1)/10	−1.00	(0.27)	<0.001
Spline (2 knots)^(c)	Intercept	26.53	(0.56)	<0.001	9141.66
	1st segment, 0—11 supermarkets	−1.25	(0.69)	0.068
	2nd segment, 11—15 supermarkets	−4.80	(1.56)	0.002
Spline (3 knots)^(d)	Intercept	25.39	(0.68)	<0.001	9132.98
	1st segment, 0—9 supermarkets	0.90	(0.86)	0.29
	2nd segment, 9—15 supermarkets	−1.11	(0.71)	0.12
	3rd segment, 15—50 supermarkets	0.66	(0.66)	0.78
Tertiles^(e)	0—9 supermarkets (baseline)	26.05	(0.25)	<0.001	9137.67
	10—14 supermarkets	−1.00	(0.34)	<0.001
	15— supermarkets	−1.49	(0.36)	<0.001

^(a)S.E. = standard error.
^(b)The fractional polynomial with intercept and covariate was found to be the best fitting from among the pre-defined set of fractional polynomials (selection is based on the AIC and is automatically carried out by the statistical algorithm). Since the logarithm is one of the possible transformations, it is not allowed to have zero values, hence the addition of a 1 to the number of supermarkets in this model.
^(c)Fixed 2-knot spline not shown on Figure 2.
^(d)Default knots for the spline function are placed at the equivalent quantiles. Hence the knot locations coincide with the tertile boundaries. With splines, it is possible to estimate the knot locations as part of the inference or to use pre-specified knot locations. The spline was anchored to be within the range of 0 and 50 for this example.
^(e)Note that the third category is unbounded. This highlights the issue of how outliers are included in the analysis and the issue of how to interpret a ‘high’ density of supermarkets, we can define high as 15, 20, 25, 30, etc. (the actual range of the data is 0—29). For closed intervals like these, the representative value can be thought of as the interval mid-point. However, taking the mid-point assumes values are uniformly distributed within the interval. For the lower band this is not true (low counts have a mean of 6.2 and median of 6 compared to the mid-point of 4.5). High counts have a mean of 18.3 and a median of 17 with an undefined mid-point due to the unspecified upper bound of percentile categorisation.
^(f)We performed a Cox test for non-nested models to compare the model fits and found no significant difference in AIC between the linear and tertile model fits. The 3-knot spline has a smaller, therefore better, AIC but none of the coefficients are significant which perhaps indicates over-fitting. The 2-knot spline, with the same number of parameters as the tertile model, is not statistically different from the linear model.

Back to article page

ISSN: 1479-5868

Contact us

Submission enquiries: Access here and click Contact Us
General enquiries: info@biomedcentral.com