Skip to main content

Validation of actigraphy sleep metrics in children aged 8 to 16 years: considerations for device type, placement and algorithms

Abstract

Background

Actigraphy is often used to measure sleep in pediatric populations, despite little confirmatory evidence of the accuracy of existing sleep/wake algorithms. The aim of this study was to determine the performance of 11 sleep algorithms in relation to overnight polysomnography in children and adolescents.

Methods

One hundred thirty-seven participants aged 8–16 years wore two Actigraph wGT3X-BT (wrist, waist) and three Axivity AX3 (wrist, back, thigh) accelerometers over 24-h. Gold standard measures of sleep were obtained using polysomnography (PSG; Embletta MPRPG, ST + Proxy and TX Proxy) in the home environment, overnight. Epoch by epoch comparisons of the Sadeh (two algorithms), Cole-Kripke (three algorithms), Tudor-Locke (four algorithms), Count-Scaled (CS), and HDCZA algorithms were undertaken. Mean differences from PSG values were calculated for various sleep outcomes.

Results

Overall, sensitivities were high (mean ± SD: 91.8%, ± 5.6%) and specificities moderate (63.8% ± 13.8%), with the HDCZA algorithm performing the best overall in terms of specificity (87.5% ± 1.3%) and accuracy (86.4% ± 0.9%). Sleep outcome measures were more accurately measured by devices worn at the wrist than the hip, thigh or lower back, with the exception of sleep efficiency where the reverse was true. The CS algorithm provided consistently accurate measures of sleep onset: the mean (95%CI) difference at the wrist with Axivity was 2 min (-6; -14,) and the offset was 10 min (5, -19). Several algorithms provided accurate measures of sleep quantity at the wrist, showing differences with PSG of just 1–18 min a night for sleep period time and 5–22 min for total sleep time. Accuracy was generally higher for sleep efficiency than for frequency of night wakings or wake after sleep onset. The CS algorithm was more accurate at assessing sleep period time, with narrower 95% limits of agreement compared to the HDCZA (CS:-165 to 172 min; HDCZA: -212 to 250 min).

Conclusion

Although the performance of existing count-based sleep algorithms varies markedly, wrist-worn devices provide more accurate measures of most sleep measures compared to other sites. Overall, the HDZCA algorithm showed the greatest accuracy, although the most appropriate algorithm depends on the sleep measure of focus.

Background

A large body of evidence has emerged implicating characteristics of children’s sleep such as short quantity, timing, poor quality, and high variability with a wide range of adverse health outcomes [1]. However, the majority of studies rely on retrospective self- or parent-reports of sleep, which may be unreliable and sensitive to recall bias [2, 3]. Although polysomnography (PSG) is considered the gold-standard measure of sleep, it is obtrusive and impractical for large-scale studies. Thus, actigraphy is increasingly being used as a practical and suitable method to objectively measure sleep, particularly over longer time frames than is possible with PSG. To estimate sleep outcomes, actigraphy data are analysed using algorithms to classify sleep and wake based on the assumption that the presence of movement indicates wakefulness and the absence of movement indicates sleep. Typically, algorithms vary by the population studied, device worn and the placement site they were developed for (i.e. wrist, ankle, waist), but most work in a similar fashion: to define each minute of recorded activity as either sleep or wake.

However, there are several issues with these existing algorithms. First, although various algorithms have been developed [4,5,6,7,8,9], few [7, 10] have been validated against the gold standard PSG in paediatric populations, with the remainder using sleep diaries or visual inspection. Second, choice of algorithm influences sleep–wake time estimates suggesting that sleep variables derived from different algorithms might not be comparable [11]. Third, although currently available sleep algorithms provide reasonable estimates of sleep, most require participants to record their sleep onset and waking times, which are used to guide the algorithm to detect nocturnal sustained bouts of inactivity. However, sleep diaries are often inaccurate, add to participant burden, and are time consuming for researchers in large scale studies [12]. To overcome these limitations, fully automated algorithms that do not require diaries have been developed for use in children which automatically score sleep [5,6,7,8,9] but evidence of their accuracy against PSG is limited [10, 13].

With the growing availability of accelerometry data from large studies, often without sleep diaries, it is necessary to establish whether sleep outcomes are comparable between brands and across various wear sites. It is also important to evaluate sleep outcome estimates between the most widely used sleep–wake algorithms, with and without the use of sleep diaries to guide the algorithm. Therefore, the aim of this study is to compare the accuracy of the most widely used sleep algorithms against overnight PSG in children and adolescents.

Methods

Participants

Children and adolescents were recruited via social media (i.e. Facebook), schools, and word of mouth. Children aged 8 to 16 years at the time of recruitment with no history of sleep disturbance (see below) were eligible for the study. Ethical approval was obtained from the University of Otago Human Ethics Committee (ref H18/073).

Data collection overview

During a visit to each participant’s home height and weight were measured and five accelerometers were attached to the child (two on the wrist, one around their waist, one on their lower back, and one on their upper thigh). These devices were worn for one 24-h period. Participants were also fitted with a portable polysomnography (PSG) machine one hour before bedtime to measure sleep during the overnight period in the home environment. Children were asked to complete a basic activity log the next day. The same computer was used to program the accelerometers and the PSG recording device and times were synchronized.

Sleep Disturbances Scale for Children (SDSC)

Parents completed the SDSC consisting of 27 items assessing sleep behaviour and disturbances in children in the previous six months [14]. A total sleep problem score is derived from six sleep disturbance factors. A score greater than 39 is indicative of a clinical disturbance and those identified as having a sleep disorder, or those with any chronic medical condition or physical disability that impeded their ability to participate in physical activity, were excluded.

Demographic and anthropometric data

Information was collected on participant’s age, sex, date of birth, and ethnicity using New Zealand census questions [15]. Their address was used to determine area based socio-economic status using the New Zealand Deprivation Index (NZDep Index, 2018) [16]. Duplicate measures of height (Model 213, Seca, Germany) and weight (Tanita HD-351) were obtained by trained research assistants. An additional measure was undertaken if duplicate measures of height differed by more than 0.5 cm and if weight differed by more than 0.5 kg. Body mass index (BMI) was calculated as weight (kg) / height (m)2, with overweight and obesity defined as a BMI z-score ≥ 85th but < 95th and ≥ 95th percentiles, respectively, using the WHO growth reference [17].

Home-based polysomnography

A home-based, PSG sleep study was conducted where overnight PSG data were recorded using a digital portable monitor (Embletta MPRPG, ST + Proxy and TX Proxy, Natus, California, USA) within participant’s homes at a sampling rate of 500 Hz following American Academy of Sleep Medicine guidelines [18]. The researcher began the PSG set up approximately one hour before bedtime. The PSG included right and left electro-oculograms (EOG), four electroencephalograms (EEG) (C4/M1, C3/M2, O2/M1, O1/M2), chin electromyogram, nasal airflow, snoring, thoracic and abdominal respiratory effort (Xact Trace Respiratory Effort Sensor) and ECG. Oxygen saturation was measured with pulse oximetry. Data were downloaded and analysed using RemLogic software (Version 3.4, Embla Systems, Broomfield, CO, USA). Low frequency filters were set at 0.3 Hz and high frequency at 35 Hz for EEG signals. Sleep stages were scored visually by one trained sleep technician in 30 s epochs using the American Academy of Sleep Medicine (AASM) sleep staging criteria [18] for children. To allow for comparison to actigraphy, the PSG epoch lengths were collapsed into one-minute epochs. In doing so, if either 30-s epoch within the minute was scored as wake, then we considered that whole minute as wake. For PSG, sleep onset was the first epoch of sleep after lights out. Total sleep time (TST) was defined as the number of minutes from sleep onset to sleep offset minus the number of minutes awake. Wake after sleep onset (WASO) represented the duration of time spent awake after initially falling asleep, while sleep efficiency (SE) was defined as follows: 1) Sleep efficiencyTIB, a commonly referenced metric, calculated as the ratio of total sleep time (TST) to time spent in bed (TIB); and 2) Sleep efficiencySPT, determined by expressing total sleep time (from sleep onset to offset, minus any WASO) as a percentage of sleep period time (from sleep onset to offset, inclusive of any WASO). We chose to use the Sleep Period Time (SPT) in our definition of Sleep efficiencySPT alongside the more traditional definition which uses TIB because one of our aims was to compare the accuracy of algorithms that required sleep diaries versus those that did not. Furthermore, the definition of SE that uses TIB, by definition, includes non-sleep related activity (eg reading, texting, mobile phone use) both prior to initiating sleep and after the final awakening, which do not reflect the construct of SE where TST is compared to the amount of time spent attempting to initially fall asleep and sleep discontinuity. Number of awakenings was the number of overnight awakenings between sleep onset to offset. The PSG and actigraphy data were analysed independently by different researchers.

Actigraphy

Two types of accelerometers were worn: the Axivity AX3 (Axivity Ltd, Newcastle, UK), and the Actigraph wGT3X-BT (ActiGraph, Pensacola, FL, USA). Both accelerometers are triaxial and were configured to record at a frequency of 100 Hz and initialised using the same personal computer as the PSG. The compact size (32.5 × 23 × 8.9 mm), lightweight design (11 g), and waterproof feature of the Axivity AX3s contribute to higher compliance among children, while the inclusion of a temperature sensor assists in non-wear detection. The Actigraph wGT3X-BT is currently the most widely used research-grade device and is larger (46 × 33 x 15 mm, 19 g) than the Axivity AX3 and lacks a temperature sensor. The three Axivity accelerometers were fitted to the right side of the lower back (waist-level), middle of the right thigh, and non-dominant wrist using custom designed hypoallergenic tape. Two Actigraph accelerometers were fitted to participants at two main sites: the non-dominant wrist using an elastic wrist strap and over the right hip using custom designed hypoallergenic tape. Axivity devices were set up and data downloaded with OmGui software version 1.0.0.30 (Open Movement, Newcastle, UK). ActiGraph wGT3X-BT devices were initialised and downloaded using ActiLife version 6.13.3, saved in raw format as.gt3x, then converted for data processing. Raw acceleration data from the Actigraph and Axivity were processed and calibrated using the open-access Pampro package v0.5 [19] and converted into hdf5 file formats for processing. All algorithms except the HDCZA were written in the Python programming language (Python Software Foundation, https://www.python.org/) and outputs were computed using this same software system, rather than proprietary device software. Data analysed using the HDCZA algorithm were processed and analysed with R-package GGIR version 1.2–0 (http://cran.r-project.org) [20].

Algorithms

The selection of algorithms featured in this manuscript was informed by a comprehensive review of pertinent literature pertaining to prevalent methodologies utilized for estimating sleep patterns in pediatric populations employing count-based actigraphy. Additionally, consideration was given to algorithms integrated within the proprietary software accompanying the Actigraph GT3X + devices. Details of how each algorithm scores sleep and wake and calculates each sleep outcome are given in Table 1. Briefly, we included three versions of the Cole-Kripke algorithm [5], two versions of the Sadeh algorithm [13], four versions of the Tudor-Locke algorithm [4, 8], the count-scaled (CS) algorithm [6], and the HDCZA algorithm [9]. In general, the versions of each algorithm differed mostly by whether they required the use of diaries to estimate sleep onset and offset and whether they included variations to account for changes in sensitivity between older and newer accelerometer models.

Table 1 Scoring for each algorithm

Statistical analyses

Epoch-by epoch comparison

One-minute epochs from the Axivity thigh, wrist, and lower back and Actigraph waist and wrist were aligned with corresponding PSG epochs. Agreement between the Axivity and Actigraph at each site placement (wrist, thigh, lower back, waist) and PSG (as the gold standard) were examined by calculating overall agreement (%), sensitivity (% sleep agreement), and specificity (% wake agreement).

Sleep outcomes were organised into three categories: sleep timing (sleep onset and offset), sleep quantity (sleep period time and total sleep time), and sleep quality (WASO, sleep efficiency, and number of night wakings). These were described with means and standard deviations and compared to PSG by calculating the mean difference and 95% confidence interval. Only participants with data for all outcomes were included for each device and placement.

Bland Altman plots were used to explore agreement against PSG for the “overall best performing” algorithm, regardless of placement site or device (by % accuracy) and for the “best performing algorithm” (by mean difference from PSG) for the site placement and device deemed to be the best performing for SPT (a measure dependent on sleep onset and sleep offset and not dependent on WASO) and WASO. Mean differences and 95% limits of agreement were calculated. Stata 17.0 (StataCorp, Texas) was used for all analyses.

Results

Study participants

In total, 384 children completed the screening questionnaire. Of these, 202 were ineligible, due to age (n = 4), lived outside the Dunedin area (n = 12) or had a sleep disturbance score greater than 39 (n = 186). A total of 182 participants were eligible to participate and of these 151 expressed further interest in the study. PSG was conducted in 138 participants with early termination of PSG for one participant due to technical failure, leaving 137 participants included in the final analyses (Supplementary Table 1 for details on missing data). The characteristics of the participants are shown in Table 2. The majority of participants were of New Zealand European ethnicity, slightly more boys participated than girls, and 37% of the sample were overweight or obese.

Table 2 Characteristics of the study population

Epoch by epoch analyses

Placement and device

Actigraph vs Axivity at the wrist vs waist, lower back, thigh

Table 3 demonstrates that in general, overall accuracy tended to be higher for both devices placed at the wrist (mostly greater than 80%) than when placed close to the centre of mass (waist, thigh, and lower back, where accuracy was generally less than 80%). However, different patterns were observed for sensitivity and specificity. Sensitivity, or the ability to detect episodes of sleep was generally higher when placed closer to the centre of mass for both the Actigraph and Axivity compared to the wrist. By contrast, specificity (% wake agreement) was considerably better for both devices at the wrist than at the waist.

Table 3 Sensitivity, specificity, and accuracy of epoch-by-epoch comparisons with PSG for sleep

Algorithms vs placement

Site of placement did not appear to affect the overall accuracy or sensitivity for each algorithm to a great extent as most algorithms appeared to perform similarly when placed close to the centre of mass (thigh, lower back, waist) or at the wrist, varying by less than 10% (Table 3). However, site of placement had a large effect on specificity for most algorithms with only the HDCZA algorithm varying by less than 10% between placements. Regardless of placement, we report similar total accuracy across the HDCZA, CS, Sadeh 1, Sadeh 2, Cole-Kripke 1, Tudor-Locke 3, and Tudor-Locke 4 algorithms, but lower accuracy for the Cole-Kripke 2, Cole-Kripke 3, Tudor-Locke 1, and Tudor-Locke 2 algorithms. Given the difficulty of actigraphy to detect periods of wakefulness during sleep, the considerably higher level of specificity for the HDCZA algorithm (ranging from 85.9% to 89.6%), compared to all others which showed specificities as low as 41.2%, with many less than 60%, should be noted.

A sensitivity analysis (Supplementary Table 2) was undertaken to determine the effect of the post-processing merge of PSG epochs into 60-s. In the original analyses if either 30-s epoch within the minute was scored as wake, we considered that whole minute as wake, whereas in the sensitivity analyses if either 30-s epoch within the minute was scored as sleep, we considered that whole minute as sleep. For most algorithms (apart from a few placed at the wrist) this resulted in marginal increases in accuracy (< 2%) as a result of increases in specificity (the ability to detect wake-time) at the expense of decreases in sensitivity (the ability to detect sleep time).

Sleep outcomes

Tables 4 (Actigraph) and 5 (Axivity) report differences between each algorithm and PSG for relevant sleep outcomes of interest in three broad categories: sleep timing (sleep onset and offset), sleep quantity (sleep period time and total sleep time), and sleep quality (sleep efficiency, WASO, and number of night wakings).

Table 4 Comparison of PSG and Actigraph GT3x measured sleep outcomes using different algorithms and at each site
Table 5 Comparison of PSG and Axivity measured sleep outcomes using different algorithms and at each site

Sleep timing

For sleep onset, almost all algorithms detected a sleep onset significantly earlier than the PSG gold standard, with differences ranging from just 2 min to as much as 149 min for the Actigraph and 1 min to 144 min for the Axivity. Overall, differences in sleep onset were generally smaller for either device when placed at the wrist, with several algorithms providing valid estimates of sleep onset with differences of just 1–15 min compared to PSG (Actigraph hip HDCZA, Actigraph wrist CS, Sadeh 1, Sadeh 2, Tudor-Locke 3, Axivity wrist CS, Sadeh 1, Cole-Krikpe 1, Tudor-Locke 3). In terms of sleep offset, differences were smaller for Actigraphs placed at the wrist than those at the hip, with all algorithms except for Tudor-Locke 2 showing small differences compared with PSG. In general, differences for the Axivity placed at the wrist were smaller than those placed at the thigh or back. However, overall, it can be seen that the Axivity placed on the thigh and to a lesser extent on the back, perform better than Actigraph at the hip, with 8 and 4 of 11 algorithms respectively reporting only small, non-significant differences compared to PSG, whereas just one algorithm (HDCZA) produced small differences with the Actigraph placed at the hip.

Sleep quantity

Tables 4 and 5 demonstrate that many of the algorithms show large differences compared with PSG, in some cases overestimating sleep by more than two hours whether measured as sleep period time or total sleep time. However, there was a clear pattern of wrist placement providing substantially more accurate estimates of sleep quantity, for both devices. For example, differences (95% CI) for the Actigraph at the wrist ranged from 1 (-12, 15) to 54 (27, 81) minutes for sleep period time, whereas the corresponding values for hip placement were up to 243 (203, 283) minutes different. A similar pattern is shown for the Axivity (Table 5). While several algorithms performed well only a few (Sadeh 1, Cole-Kripke 1, HDCZA), consistently performed well for both devices and placement sites and only the count-scaled algorithm showed a difference with PSG of less than 30 min for all eight measures examined (total sleep time and sleep period time at both wrist and hip for both devices).

Sleep quality

In terms of WASO, examination of Tables 4 and 5 demonstrate that actigraphy produces lower values for WASO compared with PSG for almost all sites, devices and algorithms tested. However, in general, estimates more closely matched PSG values when the device was placed at the wrist, particularly for the Actigraph, with 7 of the 11 algorithms showing small differences (differences ranging from just 5 to 22 min for these algorithms). On the other hand, better estimates of sleep efficiency were obtained from devices placed on the hip (Actigraph), thigh or back (Axivity). Regardless of device or placement, the algorithms tested resulted in small differences in sleep efficiency compared to PSG. Overall, sleep efficiency defined using TIB was lower than sleep efficiency defined using SPT and resulted in larger differences compared to PSG. Lastly, estimates of the number of night wakings differed considerably from PSG measures for most of the algorithms examined. Only 1 of the 20 Actigraph (Cole-Kripke 1 at the wrist) and 2 of the 30 Axivity (Sadeh 1 at the back and Cole-Kripke 1 at the wrist) algorithms tested did not produce large differences in waking frequency (Tables 4 and 5).

Bland–Altman

Figure 1 shows the Bland–Altman plots for agreement in SPT (a metric for sleep duration) and WASO (a metric for sleep quality) for the ‘overall best performing algorithm’ (HDCZA with the Axivity at the wrist), and the CS algorithm (which was the ‘best performing’ for the Axivity at the wrist for SPT). These plots illustrate that the CS algorithm performs better than the HDCZA for accurate assessment of SPT, with narrower 95% limits of agreement (LOA) (-165 to 172 min compared to -212 to 250 min for HDCZA). Both algorithms demonstrated similar performance for assessing WASO, with slightly lower 95% LOA for HDCZA (CS: -279 to 260 min; and HDCZA: -251 to 245 min) but both showed considerable inaccuracy in determining WASO at higher levels.

Fig. 1
figure 1

Bland–Altman plots for sleep period time (SPT) and wake after sleep onset (WASO) for the HDCZA and CS algorithms using the Axivity at the wrist compared to PSG. Red dashed lines indicate 95% limits of agreement

Discussion

Our study demonstrates that current count-based sleep algorithms show higher total accuracy and specificity when devices were placed at the wrist compared with other sites of wear, regardless of actigraphy brand or algorithm tested. Overall, the HDCZA algorithm demonstrated high levels of sensitivity, specificity and thus accuracy regardless of device brand or placement. In terms of the range of sleep outcomes studied, results were more variable and differed across outcomes of interest, algorithm and site of wear. Thus, researchers may choose a certain algorithm over another depending on their primary sleep outcome of interest; for example, studies of sleep timing may prefer the CS algorithm placed at the wrist, whereas studies more focussed on sleep quality may prefer the HDCZA algorithm. Poorer detection of wakefulness (poor specificity) by many of the algorithms and sites of wear continues to plague actigraphy estimates of both sleep and wake in paediatric studies [21] but specificity values are not always reported [22] despite the potential to influence data interpretation. This is also an issue in the adult field [23].

Several studies have assessed the agreement between research grade devices and PSG in healthy children, but many have been in small samples and utilised single sites of wear, devices or algorithms to detect sleep and wake states and derive sleep estimates [10, 20, 22]. Most of the sleep detection algorithms used in the present study have been previously developed and validated against PSG in healthy adults [5, 9, 13], and only a few have been validated against PSG in children [7, 10] albeit in small samples (n < 40). The findings from this much larger and more comprehensive study are broadly consistent with the original validation studies and a review of previous validation studies in children, which show that accuracy (0.84–0.92) and sensitivity (0.82–0.96) are generally good, whereas specificity (0.20–0.65) is considerably lower [20].

However it is clear from both previous research and the current study that the specificity (54–77%) [20], or ability to detect periods of wakefulness in the sleep period window, of most algorithms was better when the device was worn at the wrist, with estimates ranging from 67 to 90%. These figures are considerably higher than those observed in adult studies, which have reported specificities of 34–46% for the HDCZA, Sadeh and Cole algorithms when validated in adult samples [9, 11, 21]. These discrepancies may arise because of differences in sleep characteristics between children and adults. In our study, most children had long periods of sleep without wakefulness during the night. Although immobility generally infers sleep in accelerometery-based assessment, immobility is possible during periods of wakefulness and as such can be mistakenly identified as sleep by actigraphy; it is likely this occurs more in adults because they have more periods of conscious nocturnal awakenings than children [11, 19]. Our Bland–Altman plots also revealed some bias between actigraphy-measured sleep period time and PSG, where larger differences were apparent as sleep period time decreased. More wakefulness and the shorter sleep times of adults likely contributes to the greater misclassification of WASO and thus poorer specificity overall compared with children.

The wrist placement was also superior to the thigh, lower back and hip for estimates of sleep onset, offset, quantity (TST and SPT) and WASO for most algorithms. Prior research has also indicated that hip-worn accelerometers tend to overestimate total sleep time and sleep efficiency while underestimating wake after sleep onset (WASO), resulting in lower specificity compared to wrist-worn devices [21, 24]. This reduced specificity for hip-worn devices can be attributed to the algorithms predominantly designed for wrist-specific acceleration features, which are more attuned to nocturnal movements indicative of wakefulness. Devices positioned closer to the body’s center of mass, such as the waist or lower back, are likely to register less movement during the night, potentially leading to overlooked periods of wakefulness. Differing feature selection (y-axis acceleration, inclinometer data, rolling-window size, changes in z-angle, etc.) may also explain why different algorithms outperformed others when devices were worn at the same site. Although we previously reported better estimates of sleep onset using the count-scaled algorithm when devices were worn at the hip [10], this was a much smaller study in younger children, and the very small differences observed (-3 min versus 2 min) may reflect device specific differences or alternatively age-related differences in sleep settling habits. Only sleep efficiency (both definitions) was consistently superior when devices were worn at the hip. Because most algorithms overestimated sleep offset when worn at the hip (i.e. result in later waking), and underestimated WASO, sleep efficiency was thus higher. When determining the most optimal placement, device and algorithm to use, systematic variation should be an important aspect to consider. Systematic variation is more tractable than random variation because the direction of bias is known. In this study, the HDCZA, Sadeh 1, CS, and Cole-Kripke 1 algorithms performed well for estimates of sleep onset, offset, total sleep time and sleep period time, and importantly these estimates did not randomly vary when different devices or placements were used. Knowing that an algorithm, regardless of site placement or device type, always identifies sleep onset before PSG means that actigraphy identifies earlier sleep onset and thus overestimates total sleep time, and in turn, sleep efficiency.

Many current algorithms are disadvantaged by requiring sleep onset and offset times from diaries, which pose both respondent and analysis burden. Therefore, we specifically compared sleep estimates from three different algorithms (Sadeh 1, Cole-Kripke 2, Tudor-Locke 1) with PSG using diary recorded sleep onset and offset timings to guide the algorithm. Overall, the use of a sleep diary did not improve the level of agreement of sleep estimates between accelerometers and PSG. Although the children were asked about their sleep onset and waking times not long after awakening, it appears that estimating these timings by self-report is challenging, particularly estimating timing of sleep onset, and especially when more than one day of data are collected. These findings lend further support for using automated algorithms for detecting sleep and wake states, especially in large sample sizes.

Limitations of our study include that the accuracy in clinical populations or in children with any significant sleep disturbance is unknown, and it is not known whether these results would be similar in other age groups or those with irregular sleep patterns. Although we did not include a direct measure of sleep latency (an important sleep metric), “in-bed” time remained the same across site placement, device and algorithm, which suggests later sleep onsets would result in longer sleep latency.

The strengths of this study include the simultaneous comparison of two research-grade accelerometers worn at several sites (wrist, hip, thigh, lower back) with PSG, the rigorous reporting of actigraphy data according to recommendations for children [22], and the larger number of children included in this validation study than most previous studies [22]. Importantly, sleep data were generated using 11 different automated sleep detection algorithms commonly reported in the literature, but not previously compared to PSG in a large sample of children and adolescents. While the comparison of accelerometers to the “gold-standard” PSG is a strength, it must be acknowledged that these two techniques do measure very different signals and actigraphy sleep scoring rules, particularly for WASO, are not entirely comparable to PSG. This likely explains the discrepancies, alongside the fact that actigraphy can wrongly infer sleep when children are lying awake and relatively motionless. This is particularly relevant as children settle to sleep but are still awake, and likely explains the earlier sleep onset detected by actigraphy. PSG detects sleep using changes in brain wave signals which can occur within a 30 s epoch. This rapid change may also explain the high frequencies of wakings detected by PSG, but not by actigraphy.

The differences between PSG and actigraphy methodology may also explain the large discrepancies between algorithms for estimates such as WASO and number of awakenings. Many of the algorithms define WASO as any transition between sleep and wake after sleep onset and before sleep offset, similar to PSG scoring. However, the CS algorithm aims to minimise artefactual movements detected during sleep by actigraphy and defines WASO as movements that occur over 5 continuous minutes of awake. This method of defining WASO means disagreements between PSG and actigraphy are considerably greater, but it is not clear if estimates of sleep used to demonstrate relationships with various aspects of health are affected by differences in how WASO is defined. To our knowledge, this has not been examined in the literature. Researchers may need to consider whether using a different gold standard measure of sleep, such as videosomnography, that measures similar constructs of sleep as actigraphy in future validation studies. Accurately discriminating between “awake” time and movement during sleep is important if the true relationships between sleep and health are of interest. Future studies where relationships between sleep estimates derived using different sleep algorithms and health should also be evaluated. Likewise, understanding what brand of accelerometer and site placement is best for accurate assessment of sleep may not necessarily align with the best choice for assessing other movement behaviours in the day (such as physical activity and sedentary behaviour). Researchers investigating 24 h movement behaviours will have to consider these results in the context of their objectives.

Conclusion

In conclusion, our study suggests that automated sleep detection algorithms applied to Actigraph and Axivity accelerometers, worn either at the lower back, hip or thigh, provide moderately comparable measures with PSG, but estimates of sleep outcomes including sleep quantity, sleep onset, sleep offset and WASO improve markedly when accelerometers are worn at the wrist. Accelerometry should be used cautiously in studies where estimates of sleep quality such as sleep efficiency and number of awakenings during sleep period are important or in samples of participants who experience frequent periods of wake after sleep onset.

Availability of data and materials

Data used in the current study are available and may be obtained from the corresponding author upon reasonable request.

Abbreviations

PSG:

Polysomnography

SDSC:

Sleep Disturbances Scale for Children

BMI:

Body mass index

EOG:

Electro-oculograms

EEG:

Electroencephalograms

AASM:

American Academy of Sleep Medicine

TST:

Total sleep time

SPT:

Sleep period time

WASO:

Wake after sleep onset

SE:

Sleep efficiency

LOA:

Limits of agreement

CS:

Count-Scaled

References

  1. Matricciani L, Paquet C, Galland B, Short M, Olds T. Children’s sleep and health: a meta-review. Sleep Med Rev. 2019;46:136–50. https://doi.org/10.1016/j.smrv.2019.04.011. Epub Apr 23.

    Article  PubMed  Google Scholar 

  2. Barreira TV, Schuna JM Jr, Mire EF, et al. Identifying children’s nocturnal sleep using 24-h waist accelerometry. Med Sci Sports Exerc. 2015;47(5):937–43. https://doi.org/10.1249/MSS.0000000000000486.

    Article  PubMed  Google Scholar 

  3. Cole RJ, Kripke DF, Gruen W, Mullaney DJ, Gillin JC. Automatic sleep/wake identification from wrist activity. Sleep. 1992;15(5):461–9.

    Article  CAS  PubMed  Google Scholar 

  4. Galland BC, Kennedy GJ, Mitchell EA, Taylor BJ. Algorithms for using an activity-based accelerometer for identification of infant sleep-wake states during nap studies. Sleep Med. 2012;13(6):743–51.

    Article  PubMed  Google Scholar 

  5. Sadeh A, Lavie P, Scher A, Tirosh E, Epstein R. Actigraphic home-monitoring sleep-disturbed and control infants and young children: a new method for pediatric assessment of sleep-wake patterns. Pediatrics. 1991;87(4):494–9.

    CAS  PubMed  Google Scholar 

  6. Tudor-Locke C, Barreira TV, Schuna JM Jr, Mire EF, Katzmarzyk PT. Fully automated waist-worn accelerometer algorithm for detecting children’s sleep-period time separate from 24-h physical activity or sedentary behaviors. Appl Physiol Nutr Metab. 2014;39(1):53–7. https://doi.org/10.1139/apnm-2013-0173. Epub 2013 Jun 26.

    Article  PubMed  Google Scholar 

  7. van Hees VT, Sabia S, Jones SE, et al. Estimating sleep parameters using an accelerometer without sleep diary. Sci Rep. 2018;8(1):12975.

    Article  PubMed  PubMed Central  Google Scholar 

  8. Smith C, Galland B, Taylor R, Meredith-Jones K. ActiGraph GT3X+ and actical wrist and hip worn accelerometers for sleep and wake indices in young children using an automated algorithm: validation with polysomnography. Front Psych. 2019;10:958.

    Article  Google Scholar 

  9. Quante M, Kaplan ER, Cailler M, et al. Actigraphy-based sleep estimation in adolescents and adults: a comparison with polysomnography using two scoring algorithms. Nat Sci Sleep. 2018;10:13–20. https://doi.org/10.2147/NSS.S151085. eCollection 2018.

    Article  PubMed  PubMed Central  Google Scholar 

  10. Girschik J, Fritschi L, Heyworth J, Waters F. Validation of self-reported sleep against actigraphy. J Epidemiol. 2012;22(5):462–8.

    Article  PubMed  PubMed Central  Google Scholar 

  11. Bruni O, Ottaviano S, Guidetti V, et al. The Sleep Disturbance Scale for Children (SDSC). Construction and validation of an instrument to evaluate sleep disturbances in childhood and adolescence. J Sleep Res. 1996;5(4):251–61.

    Article  CAS  PubMed  Google Scholar 

  12. Health Information Standards Organisation. 2017. HISO 10001: 2017 Ethnicity data protocols.

  13. Atkinson J, Salmond C, Crampton P. NZDep2013 index of deprivation. Wellington: University of Otago; 2014.

    Google Scholar 

  14. World Health Organisation. WHO child growth standards based on length/height, weight and age. Acta Paediatr Suppl. 2006;450:76–85.

    Google Scholar 

  15. Berry RB, Brooks R, Gamaldo CE, Harding SM, Marcus C, Vaughn BV. The AASM manual for the scoring of sleep and associated events. Rules, Terminology and Technical Specifications. Darien: American Academy of Sleep Medicine; 2012. p. 176.

    Google Scholar 

  16. Sadeh A, Sharkey KM, Carskadon MA. Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep. 1994;17(3):201–7. https://doi.org/10.1093/sleep/17.3.201.

    Article  CAS  PubMed  Google Scholar 

  17. Bunce C. Correlation, agreement, and Bland-Altman analysis: statistical analysis of method comparison studies. Am J Ophthalmol. 2009;148(1):4–6. https://doi.org/10.1016/j.ajo.2008.09.032.

    Article  PubMed  Google Scholar 

  18. Hyde M, O’Driscoll DM, Binette S, et al. Validation of actigraphy for determining sleep and wake in children with sleep disordered breathing. J Sleep Res. 2007;16(2):213–6.

    Article  PubMed  Google Scholar 

  19. Meltzer LJ, Wong P, Biggs SN, et al. Validation of actigraphy in middle childhood. Sleep. 2016;39(6):1219–24.

    Article  PubMed  PubMed Central  Google Scholar 

  20. Meltzer LJ, Montgomery-Downs HE, Insana SP, Walsh CM. Use of actigraphy for assessment in pediatric sleep research. Sleep Med Rev. 2012;16(5):463–75.

    Article  PubMed  PubMed Central  Google Scholar 

  21. Slater JA, Botsis T, Walsh J, King S, Straker LM, Eastwood PR. Assessing sleep using hip and wrist actigraphy. Sleep Biol Rhythms. 2015;13(2):172–80.

    Article  Google Scholar 

  22. Lee YJ, Lee JY, Cho JH, Choi JH. Interrater reliability of sleep stage scoring: a meta-analysis. J Clin Sleep Med. 2022;18(1):193–202.

    Article  PubMed  PubMed Central  Google Scholar 

  23. Smith MT, McCrae CS, Cheung J, Martin JL, Harrod CG, Heald JL, et al. Use of Actigraphy for the Evaluation of Sleep Disorders and Circadian Rhythm Sleep-Wake Disorders: An American Academy of Sleep Medicine Systematic Review, Meta-Analysis, and GRADE Assessment. J Clin Sleep Med. 2018;14 (7):1209–30.

  24. Zinkhan M, Berger K, Hense S, Nagel M, Obst A, Koch B, et al. Agreement of different methods for assessing sleep characteristics: a comparison of two actigraphs, wrist and hip placement, and self-report with polysomnography. Sleep Med. 2014;15(9):1107–14.

Download references

Acknowledgements

The authors wish to thank the participants and their families for their participation in this study.

Funding

This research was supported by University of Otago, Department of Medicine funding. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Author information

Authors and Affiliations

Authors

Contributions

KMJ, RT, SD and TS were responsible for study inception and design. GK consolidated and cleaned data for analysis. AG, AC and AC were responsible for data collection. GK wrote the Python code and TS analysed all data using GGIR. JH was responsible for developing and conducting the statistical analysis plan. KMJ led manuscript writing and all authors conducted a critical revision and provided substantive feedback on the manuscript, in addition to approval of the final version.

Corresponding author

Correspondence to K. A. Meredith-Jones.

Ethics declarations

Ethics approval and consent to participate

Ethics approval was provided by the University of Otago Human Ethics Committee (ref H18/073) and participants provided written, informed consent prior to commencing participation.

Consent for publication

Not applicable.

Competing interests

The authors report no conflict of interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1: Supplementary Table 1.

Number of participants with missing data from each algorithm (n=131). Supplementary Table 2. Sensitivity analysis for sensitivity, specificity, and accuracy of epoch-by-epoch comparisons with PSG for sleep with half PSG epochs assigned as sleep (rather than wake).

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Meredith-Jones, K.A., Haszard, J.J., Graham-DeMello, A. et al. Validation of actigraphy sleep metrics in children aged 8 to 16 years: considerations for device type, placement and algorithms. Int J Behav Nutr Phys Act 21, 40 (2024). https://doi.org/10.1186/s12966-024-01590-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s12966-024-01590-x

Keywords