Design programmes to maximise participant engagement: a predictive study of programme and participant characteristics associated with engagement in paediatric weight management

Background Approximately 50 % of paediatric weight management (WM) programme attendees do not complete their respective programmes. High attrition rates compromise both programme effectiveness and cost-efficiency. Past research has examined pre-intervention participant characteristics associated with programme (non-)completion, however study samples are often small and not representative of multiple demographics. Moreover, the association between programme characteristics and participant engagement is not well known. This study examined participant and programme characteristics associated with engagement in a large, government funded, paediatric WM programme. Engagement was defined as the family’s level of participation in the WM programme. Methods Secondary data analysis of 2948 participants (Age: 10.44 ± 2.80 years, BMI: 25.99 ± 5.79 kg/m2, Standardised BMI [BMI SDS]: 2.48 ± 0.87 units, White Ethnicity: 70.52 %) was undertaken. Participants attended a MoreLife programme (nationwide WM provider) between 2009 and 2014. Participants were classified into one of five engagement groups: Initiators, Late Dropouts, Low- or High- Sporadic Attenders, or Completers. Five binary multivariable logistic regression models were performed to identify participant (n = 11) and programmatic (n = 6) characteristics associated with an engagement group. Programme completion was classified as ≥70 % attendance. Results Programme characteristics were stronger predictors of programme engagement than participant characteristics; particularly small group size, winter/autumn delivery periods and earlier programme years (proxy for scalability). Conversely, participant characteristics were weak predictors of programme engagement. Predictors varied between engagement groups (e.g. Completers, Initiators, Sporadic Attenders). 47.1 % of participants completed the MoreLife programme (mean attendance: 59.4 ± 26.7 %, mean BMI SDS change: -0.15 ± 0.22 units), and 21 % of those who signed onto the programme did not attend a session. Conclusions As WM services scale up, the efficacy and fidelity of programmes may be reduced due to increased demand and lower financial resource. Further, limiting WM programme groups to no more than 20 participants could result in greater engagement. Baseline participant characteristics are poor and inconsistent predictors of programme engagement. Thus, future research should evaluate participant motives, expectations, and barriers to attending a WM programme to enhance our understanding of participant WM engagement. Finally, we suggest that session-by-session attendance is recorded as a minimum requirement to improve reporting transparency and enhance external validity of study findings. Electronic supplementary material The online version of this article (doi:10.1186/s12966-016-0399-1) contains supplementary material, which is available to authorized users.

Only variables with a degree of missingness are presented.

Criteria Item #2: Reasoning for Missing Values
There are 20 variables of interest used within this study, 13 of which have a degree of missingness. A number of variables (e.g. gender, medical conditions…) had complete data -the reasoning for this was due in part to data collection protocol of MoreLife: certain variables were mandatory to be completed by the participants before enrolling on the programme. Moreover, programme specific variables were complete as they are derived from the characteristics of the programme themselves, and are not affected by participant completion of the pre-entry documentation.
Where variables have missing data (Table 1), it is not possible to state why for each individual participant (n = 3729), instead it is possible to interrogate the dataset as a whole and further, split by programme location to uncover trends in missingness. The trends in this data set suggest that data are missing systematically among some of the programme locations -this has also been confirmed by the data provider, MoreLife. Although MoreLife collect data in accordance to a protocol, different programme delivery areas are managed by various teams, and overseen by programme commissioners. As a result, these commissioners are able to select which measures they wish to collect data on, hence the reason why Sedentary Behaviour is missing data in 54% of cases. The missingness is therefore not due to the participants, but to the programme which they attended ( Table 2). As such, it is unlikely that cases with missing data differ significantly from those with complete cases and so imputation is a plausible method to counter missing data. On further inspection, many of the data appear to be missing due to participant's classified as Non-Initiators. Non-Initiators sign on to the programme, and in doing so, provide some basic information (e.g. gender, age, medical condition), however they do not attend any of the sessions within the programme. The MoreLife protocol stipulates that anthropometric measures and questionnaires are completed in the first week, but as Non-Initiators do not attend the first week, they do not have any of these measures recorded. It is of utmost importance when working with missing data to identify if the missingness can be explained by observed data in the sample. It is fair to conclude that the missingness can be explained by the two variables discussed: Completion Status and Programme Area.
Little's MCAR test was implemented to establish if data are missing completely at random (MCAR).
The result inferred they are not MCAR due to a significant chi-squared value (χ 2 = 868.98, df = 300, p = 0.00). This would suggest data are either Missing at Random (MAR) or Missing Not at Random   Little's MCAR test. Data were also not assumed to be MNAR because the missingness could be rationally and completely explained by observed data [2]. From this point forth, data are treat as MAR.

Criteria Item #3: Removal of Data due to Missing Values
Given that data are assumed to be MAR, a number of processes need implementing before data are considered suitable for any form of data filling. Part of this process is to remove data according to an exclusion criteria (See Figure 1: Main Study). Aside data exclusion discussed previously, an additional removal criterion was applied to the sample before it was fit for data filling. This will be expanded upon here.
Briefly, data were removed due to Influential Outliers, Invalid Measurements, and participants not  It is not possible to gather any statistically meaningful results from a group of participants with minimal data, and as a result Non-Initiators (n = 781) were removed from the sample.
The initial sample of participants was reduced to a final figure of 2948 participants. This equates to 68.6% of the initial sample. A number of options are available for countering the issue of missing data, one of which is to use data from participants with complete cases. If such an analysis was to be conducted, data of 907 participants (30.8%) would be eligible for use. This approach, known as list wise deletion, has been advocated when missingness is present in less than 5% of cases [4]. Here, with missingness in 69.3% of the participants' cases, this approach would not be a suitable, ethical or valid method of dealing with missing data.
A total of 1349 participants were removed from the sample (see Figure 1). All future analysis will be conducted using the remainder of the initial sample (n = 2948).

Criteria Item #4: Differences between Complete and Incomplete Cases
When working with missing data, it is important to distinguish differences between participants with complete cases and those with missing cases, especially with regards to participant characteristics.
This enables one to observe if analysis on complete data may underestimate, and thus not provide a representative picture, the characteristics of the total population. Therefore, all participant variables were assessed for differences. Table 5 and Table 6 demonstrate that significant differences were present between participants with complete and incomplete data. These differences included: Ethnicity, IMD variables, Medical Conditions, Age, Attendance and Completion Status, and Waist Circumference. BMI Classification, BMI SDS, Sedentary Behaviour and Gender were marginally deemed insignificant -this suggests that a degree of difference was present, although not substantial enough to provide statistical significance. There were no significant differences between Self-esteem and Body Satisfaction.
These findings are substantive, insofar that if analysis was to be conducted solely based on the participants with complete data, then the findings may be biased -particularly where significant differences were found between complete and incomplete cases. This reassures the need for an approach to filling missing data.  This plausible method of dealing with missing data have been previously used in the area of engagement research [3] and the wider health related research [7,9]. Other methods are available when working with missing data, however MI has been adopted due to its frequent use in similar Chicago, IL). The potential to undertake MI using SPSS was introduced in a recent development of the software, and is becoming more widely used in the field [7]. Other programmes such as SAS and Stata can also be used for MI.
A fully conditional specification (Multiple Imputation by Chained Equations) was used to impute data. This specification utilises sequential regression to impute the missing values dependent on the additional, specified variables in the model. For example, three variables may have missing data within them, and the missingness mechanism suggests data are absent due to two of the other, complete variables. A fully conditional specification accounts for the variables which can explain the missingness mechanism. This will be discussed further in the next item.
The fully conditional specification is also able to work with multiple data types (continuous, nominal and interval). This is possible as each variable included in the model is imputed using its own model

Criteria Item #7: Number of Imputed Data Sets Generated
Ten data sets were imputed in order to reduce the sampling variability from the imputation process.
Although five data sets are recommended by Sterne et al. For sensitivity analysis, models with fewer iterations were completed (e.g. 5 data sets, 500 case draws and 5 parameter draws), but this did not yield differences between the descriptive statistics. Table 7 outlines the differences between the descriptive statistics when implementing different imputation methods and including a variety of predictor variables. The larger case draw (2500 iteration model) was taken forth as the imputed data set (Labelled Final in Table 7). Units are relative to each variable. LW: Listwise Deletion V1: 2500 iteration model: 10 data sets, 500 case draws, 5 parameter draws. V2: 2500 iteration model: 5 data sets, 500 case draws, 5 parameter draws. V3: 1250 iteration model: 10 data sets, 250 case draws, 5 parameter draws. V4: 500 iteration model: 10 data sets, 250 case draws, 5 parameter draws. V5: 2500 iteration model: 10 data sets, 500 case draws, 5 parameter draws. V6: 2500 iteration model: 10 data sets, 500 case draws, 5 parameter draws. Final: 2500 iteration model: 10 data sets, 500 case draws, 5 parameter draws.

Criteria Item #8: Variables Included in the Final Imputation Model
SPSS (SPSS INC, Chicago, IL) was used for the purpose of MI. A total of 21 variables were entered into the imputation model which would then be accounted for when imputing missing data (Table 8).
Predictor Only variables were those which were completely observed and had no values missing.
More importantly, a number of these variables help to explain the missingness mechanism and are therefore required to impute missing data. Of utmost importance is the inclusion of the outcome variables: Completion Status and Percentage of Attendance. Not including the outcome variable could weaken the associations with the predictor variables [2]. Partially observed variables were those with a proportion of missingness. These variables were still used by the MI model to impute missing values in other variables and in their own variable. All participant related variables were included in the MI model to facilitate the most reliable imputation of missing data.
For continuous variables with missing data, SPSS allows the researcher to define the upper and lower parameters. This ensures that any imputed values fall within a specified range. The MI process makes it unlikely that the imputed, missing values will exceed the parameter limits. To impute categorical data, all variables had to be binary coded -this required some variables to be collapsed.
This will be discussed in the criteria item #9.

Criteria Item #9: Handling with Non-Normally Distributed and Categorical Data
All continuous data were parametric and normally distributed. For MI to be completed using the SPSS software, categorical variables needed to be collapsed into binary groups. This resulted in ethnicity and BMI classification being reduced to two categories.
The Fully Conditional specification used in the MI procedure enables multiple data types to be worked with. As such, continuous and categorical (ordered and unordered) variables were used in the MI model.

Criteria Item #10: Statistical Interaction in the Final Analysis
There were no statistical interactions in the final analysis.

Criteria Item #11: Observed and Imputed Values: Sensitivity Analysis of Frequencies and Descriptives
Variables what constitutes as a "large fraction", it would be viable to conclude that 53.7% is a large fraction of data. With that said, the tables below (Table 9 and Table 10) demonstrate the differences between the original data set (casewise deletion) and the imputed data set (imputed). Table 9 and Table 10 highlight the mean/percentage values of variables by casewise deletion (whereby n varies) and by imputation (of which n is the pooled value of 10 data sets). As shown, the values of both the imputed and casewise deletion data do not differ greatly. Where missingness in the variable of interest was relatively low (e.g. BMI SDS, WC SDS) the mean value remained unaltered, although the confidence intervals adjusted slightly. Imputed data appear not to impact the mean/percentage values greatly, and instead are able to retain the original sample size. The proceeding criteria items assess the use of imputed data in the main study analysis.   respectively. Although the magnitude of these differences appears not to be large, to a weight management interventionist these differences would be significant.
The final sensitivity analysis assessed the differences between imputed and complete cases in the multivariable regression analyses.