# Table 1 Comparison of modelling approaches for predicting BMI from the count of supermarkets within 5 km

Model Predictor Coefficient (S.E.) (a) p-value AIC (f)
Linear Model Intercept 26.43 (0.35) <0.001 9139.73
Count of supermarkets −0.10 (0.03) <0.001
Fractional Intercept 26.53 (0.37) <0.001 9139.73
Polynomial(b) (Count of supermarkets + 1)/10 −1.00 (0.27) <0.001
Spline (2 knots)(c) Intercept 26.53 (0.56) <0.001 9141.66
1st segment, 0—11 supermarkets −1.25 (0.69) 0.068
2nd segment, 11—15 supermarkets −4.80 (1.56) 0.002
Spline (3 knots)(d) Intercept 25.39 (0.68) <0.001 9132.98
1st segment, 0—9 supermarkets 0.90 (0.86) 0.29
2nd segment, 9—15 supermarkets −1.11 (0.71) 0.12
3rd segment, 15—50 supermarkets 0.66 (0.66) 0.78
Tertiles(e) 0—9 supermarkets (baseline) 26.05 (0.25) <0.001 9137.67
10—14 supermarkets −1.00 (0.34) <0.001
15— supermarkets −1.49 (0.36) <0.001
1. (a)S.E. = standard error.
2. (b)The fractional polynomial with intercept and covariate was found to be the best fitting from among the pre-defined set of fractional polynomials (selection is based on the AIC and is automatically carried out by the statistical algorithm). Since the logarithm is one of the possible transformations, it is not allowed to have zero values, hence the addition of a 1 to the number of supermarkets in this model.
3. (c)Fixed 2-knot spline not shown on Figure 2.
4. (d)Default knots for the spline function are placed at the equivalent quantiles. Hence the knot locations coincide with the tertile boundaries. With splines, it is possible to estimate the knot locations as part of the inference or to use pre-specified knot locations. The spline was anchored to be within the range of 0 and 50 for this example.
5. (e)Note that the third category is unbounded. This highlights the issue of how outliers are included in the analysis and the issue of how to interpret a ‘high’ density of supermarkets, we can define high as 15, 20, 25, 30, etc. (the actual range of the data is 0—29). For closed intervals like these, the representative value can be thought of as the interval mid-point. However, taking the mid-point assumes values are uniformly distributed within the interval. For the lower band this is not true (low counts have a mean of 6.2 and median of 6 compared to the mid-point of 4.5). High counts have a mean of 18.3 and a median of 17 with an undefined mid-point due to the unspecified upper bound of percentile categorisation.
6. (f)We performed a Cox test for non-nested models to compare the model fits and found no significant difference in AIC between the linear and tertile model fits. The 3-knot spline has a smaller, therefore better, AIC but none of the coefficients are significant which perhaps indicates over-fitting. The 2-knot spline, with the same number of parameters as the tertile model, is not statistically different from the linear model. 