Participants and procedures
The present study analyzed data from the Patterns of Habitual Activity across SEasons (PHASE) study [24]. Primary schools located within 40 km of the Melbourne Central Business District, Australia, and with > 200 pupils, were stratified into tertiles of socioeconomic status (SES) using the Socio-Economic Index for Areas [25]. Schools within each SES stratum were randomly selected and invited to participate. Principals from nine schools (five high, three mid, and one low SES) agreed for their school to participate in the study. All 1270 children in Years 4 and 5 (aged 8–11 years) received an invitation to participate. Informed written parental consent for at least one component of the study was received for 326 children (25.7%). Approvals for the study were granted by the Deakin University Human Ethics Advisory Group (Health), Department of Education and Early Childhood Development, and Catholic Education Office (Melbourne) (approval identifiers: HEAG-H 13_2012 and 2020–265).
Each participant was asked to complete a physical activity assessment (simultaneous wear of the ActiGraph and activPAL) in the winter, spring, summer, and fall, up to four total measurement periods. Of the 1304 (326*4) possible ‘participant-seasons’, 586 (45%) were excluded from the present analyses due to having no valid days for which both accelerometers were simultaneously worn. The final sample comprised 718 participant-seasons from 278 participants. Participants were divided into three sets: a training dataset (n = 156), used to train the candidate CHAP-child models; a model selection dataset (n = 38), used to compare candidate models and select the final model; and a testing dataset (n= 84), used to evaluate the performance of the final CHAP-child model selected. This partitioning was necessary because machine learning models have been shown to overperform when applied to data from the same participants on which they were trained [26]. The model selection dataset was necessary because several models were evaluated to inform the selection of the final model. No participant was assigned to > 1 dataset. Randomization was used to achieve balance on participant age, sex, and accelerometer wear time, school SES and school identifier, and season to maximize variability in potential correlates of sedentary time within each dataset, which improves generalizability in the results [26]. Specifically, we randomly assigned participants to the training, model selection, and testing datasets, evaluated distributions of these variables, and repeating the randomization until the datasets were approximately balanced.
Measures
Participant characteristics (baseline descriptive information)
Participant age and sex were assessed by questionnaire. Body mass index (BMI, kg⋅m−2) at the first assessment period was calculated using objective height and weight measures collected via standardized protocols, then converted to age- and sex-normed BMIz scores [27].
Accelerometers
Each activity assessment involved concurrent wear of a hip-mounted triaxial ActiGraph GT3X + (ActiGraph LLC, Pensacola, FL, USA) and thigh-mounted activPAL3 (PAL Technologies Ltd, Glasgow, Scotland) for up to eight consecutive days. Participants were instructed to remove the monitors during water-based activities and sleep. The ActiGraph was worn on an elastic belt and situated on the right side, along the anterior axillary line at the level of the suprailiac crest. The activPAL was enclosed in a small pocket on an adjustable elastic belt and secured at the mid-anterior position on the child’s thigh.
Data format and pre-processing
The activPAL yielded output in an ‘events’ file that was used to label one second epochs as sitting/lying (i.e., sedentary, referred to hereon as sitting) or standing/stepping (recoded as non-sitting), which were based on activPAL’s proprietary VANE algorithm set to require ≥ 10 s in a new posture for the posture to be registered [28]. The ActiGraph yielded two types of output: (1) raw acceleration values for the three accelerometer axes were collected at 30 Hz and used at 10 Hz (10 rows/values per second), chosen since these values rarely vary over a higher frequency; (2) acceleration counts values were extracted at one minute epochs (once value each minute), which were based on ActiGraph’s counts algorithm. The raw acceleration values were used for training the CHAP-child deep learning models, and the counts values were used to determine ActiGraph non-wear time and to compare CHAP-child to current practice (i.e., the 100 cpm cut point method) in the final statistical analyses. Non-wear time was determined separately for each monitor and sleep time was determined for the activPAL, which may have been included in the data if a participant failed to remove the activPAL at bedtime. Data were only included for periods that registered as wear time for both monitors and as non-sleep for the activPAL. ActiGraph non-wear was determined by the Choi algorithm (90-min window, 30-min streamframe, and 2-min tolerance) [29, 30]. activPAL non-wear and sleep were determined by ProcessingPAL using default settings that were shown to have good validity in this population [31,32,33]. There were no additional wear time criteria employed for the training and model selection datasets. The final pre-processing step for creating the deep learning model inputs involved aggregating the activPAL data to 10-s epochs. An epoch was considered sedentary if ≥ 6 s were labeled sitting/lying, otherwise it was labelled as non-sitting. 10-s epochs were chosen over longer time intervals when developing CHAP-child to provide the highest possible resolution of information that may be desired in some circumstances, such as for determining the precise timing of a sit-to-stand transition. Shorter time intervals were considered but believed to be less appropriate due to (1) the previously mentioned 10-s minimum requirement in a new posture for the posture to be registered by the activPAL, a commonly used threshold used to define postural transitions [28, 34], and (2) the potential for small amounts (e.g., several seconds) of time drift between two sensors over the wear period [35].
CHAP-child model development
Details of the machine learning architecture and training procedures have been previously published [22, 36], with additional information available at https://github.com/ADALabUCSD/DeepPostures. The python based TensorFlow platform was used. Using the training dataset (N = 156 participants), a deep learning CNN was developed and applied to the 10 Hz raw ActiGraph data to generate features for which values varied every 10 s. This epoch length was chosen to match the timing of the 10-s epochs in the activPAL data. The raw triaxial acceleration data provide information on monitor positioning and rotation that are used by the CNN. The CNN aimed to automatically identify features within each 10-s epoch that could differentiate between sitting (includes lying) and non-sitting.
This convolution-based approach contrasts with traditional feature engineering, which requires researchers to pre-define features in the data (e.g., mean and variance) that are expected to have predictive utility. The CNN output features were then fed into a bi-directional long short-term memory network to learn the temporal dynamics of how sitting and non-sitting epochs occur in sequence, capturing the timing of the beginning and end of periods of sitting. Lastly, a softmax layer was used to determine the probability of sitting versus non-sitting for each 10-s epoch. The predicted label for the 10-s epoch was the posture which had the higher predicted probability (i.e., probability > 0.5). Transitions can be inferred based on the beginning and end (i.e., sit to stand transition) of each sitting period/bout, but were not actually labelled by the CHAP-child model. Numerous models were trained, differing on hyperparameters (e.g., window size, number of layers and neurons). Their performance was compared in the model selection dataset, with the best-performing model being selected as the final ‘CHAP-child’ model. Selection was based on a combination of balanced accuracy for predicting A) sitting vs. non-sitting, and B) sit-to-stand transitions, with the model that maximized both values being selected.
Post-processing and variable derivation
In the test dataset, which was used in the statistical analyses for evaluating CHAP-child, participant-level sedentary variables were scored by aggregating the (1) one second epoch activPAL labels, (2) 10-s epoch CHAP-child labels, and (3) one minute epoch ActiGraph counts data within each participant-season. The resolution of input data for the acivPAL and ActiGraph counts reflected usual practice, and the counts data were not used in shorter epochs because previous research has shown shorter epochs (e.g., 15 s) lead to an even greater overestimation of sit-to-stand transitions [11]. For these counts data, a minute was considered sedentary if the vertical axis value was > 100 cpm [37].
Sedentary bouts were defined as periods of sedentary time lasting ≥ 1 epoch, meaning that the shortest possible bout duration was 10 s for activPAL (due to the requirement of ≥ 10 s in a new posture for the posture to be registered), 10 s for CHAP-child, and one minute for the ActiGraph cpm method. A break in sedentary time, which was synonymous with a sit-to-stand transition, was always defined as any time a sedentary epoch was followed by a non-sitting epoch (no allowance for interruptions, i.e., no tolerance).
Standard participant-level sedentary time and bout pattern variables were then calculated based on the activPAL data, CHAP-child data, and 100 cpm data [37,38,39]. These variables included total sedentary time (minutes/day), breaks from sedentary time (number/day), time spent in sedentary bouts lasting ≥ 30 min (minutes), mean sedentary bout duration (mean of all sedentary bouts; minutes), usual bout duration (the bout duration in minutes at which 50% of sedentary time was accumulated; minutes [40]), and alpha (an individual’s distribution/slope of sedentary bout lengths based on a power law function; unitless, lower values reflect more time in prolonged bout lengths [41]).
Statistical analyses
The statistical analyses aimed to evaluate the CHAP-child model in the testing dataset. Participant characteristics were summarized using descriptive statistics and compared between study samples (training, model selection, and testing) using two-sample t-tests for continuous variables and chi-square tests for categorical variables.
The epoch-level analyses involved the full testing dataset of 84 participants and did not employ additional wear time criteria. To assess epoch-level agreement (i.e., 10-s labels of sitting or non-sitting), CHAP-child was compared against activPAL using sensitivity, specificity, balanced accuracy (mean of sensitivity and specificity), positive predictive value (PPV), and negative predictive value (NPV). Each metric was calculated for each participant-season. Means and standard deviations (SD) were computed across participant-seasons. To assess agreement between CHAP-child and activPAL for classifying sit-to-stand transitions, sensitivity and PPV were calculated using the transition pairing method with a 1-min lag time tolerance [42]. This approach was used due to the rare occurrences of sit-to-stand transitions relative to sitting and non-sitting. The 1-min lag threshold was selected to still give credit to CHAP-child predictions that were within 1 min of the true transition as measured by activPAL, as we believed most investigations would not require accuracy timing of < 1 min. All epoch-level classification metrics were compared between sexes and across seasons.
For analyses of participant-season level sedentary pattern variables, inclusion was limited to days with ≥ 8 h of simultaneous monitor wear and participant-seasons with ≥ 3 such days. This was done to reflect data exclusion approaches commonly used in applied studies of device-measured physical activity and sedentary time, which aim to capture a reliable representation of the participant’s activity [23, 43]. Sixty-five of the 84 participants in the testing dataset met these inclusion criteria, contributing 127 participant-seasons. To assess participant-season level agreement, the sedentary variables based on CHAP-child and the 100 cpm cut-point were compared against activPAL. Performance evaluations were focused on bias (i.e., mean difference), mean absolute error (MAE), mean absolute percent error (MAPE), Spearman correlation coefficients, and concordance correlation coefficients (CCC) [44]. All correlation coefficients were interpreted as small (≤ 0.40), moderate (0.41–0.60), large (0.61–0.80), or very large (0.81–1.0) [45]. MAPEs < 25% were judged as minimally acceptable, though there are not clear guidelines for judging these values and lower values are desirable. All statistical analyses were performed in R [46].