This study presents a comparative validation of self-report measures of sedentary time and pattern against an objective measure of sitting, using a systematic process. The use of the TASST framework to construct a set of 18 self-report tools, enables generalization of the results to most existing self-report tools, and allows recommendations on future development of self-report tools to be made. Overall, self-report tools of total sedentary time show poor accuracy, with large bias and wide limits of agreement. With the exception of composite measures based on the sum of time spent in different SBs, all self-report measures under-reported sedentary time. This will affect surveillance systems and studies obtaining population estimates of average sedentary time. It also makes comparisons between surveys and studies using different self-report tools difficult. Using the correction factors shown in Table 2 to remove the systematic part of the error will provide more accurate estimates of population average sedentary time and enable better comparison between studies using different tools.
All self-report tools showed low correlation (0.38 in the best case) with objective data and low precision, with random error generally larger than 2.5 h. This will affect epidemiological studies that try to ascertain relationships between sedentary time and health outcomes or potential determinants. Surprisingly, proxy measures such as TV time performed the best in this respect. This might be because TV time is a ubiquitous SB that often follows a specific schedule, making it easier to recall. The only other measure of total sedentary time that provided comparable measurement characteristics with TV time, was an assessment of total sedentary time using a visual analogue scale of the proportion of the day spent sitting. Generally, composite measures were subject to more random error, which grows with the complexity of the measure. For example, recollecting time spent in thirteen different SBs leads to larger random error than recollecting time sent sedentary in four domains. Similarly, composite measures based on patterns of SB requires recollecting the number of bouts and the average duration of a sedentary bout. This appears both very imprecise, and difficult for participants to complete (high rate of data loss through missing data).
Finally, data loss due to the self-report measures either not being completed or not providing exploitable data is another important factor to consider. In this respect, most self-report tools had less than 5% data loss (Fig. 4). The tools most affected by data loss were composite measures, especially if they require recollecting more subscales or complex constructs such as pattern, that a participant might find difficult to consider.
The biggest influence of measurement characteristics appears to be the type of question asked, and not the recall period used. Recent reports [16,17,18] saw reducing the recall period as a positive way to improve the validity and accuracy of self-report measures for SB. In this study, recall period appears to have little (and not a systematic) influence on the accuracy, precision, criterion validity or data loss of self-report measures of SB. To improve measurement characteristics, the type of assessment seems a more promising feature to change.
The results show that assessment of pattern is least valid type of self-report of SB. Self-report assessment of the number of SB bouts is prone to very large systematic and random error and does not correlate with objective assessment. From studies using objective monitors, it is possible that the pattern in which ST is accumulated may influence health as well as the total time spent sedentary [19, 20]. However, it appears that self-report is not a valid measure of pattern of SB, which might preclude large scale studies of the impact of pattern of SB and effect of “breaks” using self-report measures.
Recommendations
This comparative validation study clearly shows that no self-reported tool of sedentary time provides a measurement of sedentary time with the same accuracy, precision and validity of objective SB measures. Therefore, when possible, objective measures should be used instead of self-report tools. Using the results of this study, in conjunction with the TASTT framework, some recommendations can be made about choosing the best possible self-report tool to measure SB if an objective measure is not possible. These are summarised as a flow chart in Fig. 5 and will depend primarily on whether a survey or study already uses a pre-existing measure, and on the main aim of the study [21]. Recommendations are expressed here in terms of taxa within the TASST framework, but this can be translated to specific self report tools using Additional file 3 which provides a table mapping the existing tools identified in a previous review [8] against the TASST domains assessed in the current validation study. For an existing survey that already includes assessment of sedentary time, it is probably only worth changing this assessment if the tool used does not fall within the type of assessment covered by a single item (direct taxon 1.1.1 or proxy taxon 1.1.2) with any recall period, as any gain in precision or reduced data loss are likely to be small (Fig. 4). In this case, the continuity of data collected with previous samples/studies is probably more important than moderate improvements in future data collected, especially in terms of population surveillance. In the case of a new survey or study, or the first introduction of a sedentary time assessment, the choice should be guided by the aim of the survey or tool.
Surveillance
If the primary aim is surveillance of total sedentary time, then using a visual analogue scale and either a previous day or previous week recall period would give the most accurate and precise results. However, if the aim of the surveillance is to look into more details at a specific SB or a specific domain where SB occurs, but an estimate of total sedentary time is still required, then composite measures should be adopted. A composite measure based on sum of domains would be preferable to one based on sum of behaviours for surveillance of total sedentary time, and should be used unless the aim of the survey or study is to monitor distribution of time between SBs. For this type of assessment, however, gains can be made in terms of reduced data loss by adopting short (previous day) or unanchored recall periods.
Epidemiology
If the primary aim is epidemiological research, then strong correlation and low random error are the most important measurement characteristics to consider. Using a proxy measure with an unanchored recall period or adopting a visual analogue scale assessment of the proportion of the day spent sitting provide the most valid measures with the lowest data loss. These should be preferentially used over other types of assessment and recall periods. However, a recent consensus recent consensus highlight that understanding the context of sedentary time is a research priority [22]. In this case composite measure would be more appropriate. Choosing the appropriate recall period might avoid unwanted data loss (Fig. 4).
Strengths and limitations
The strengths of this study are:
-
the use of a well-established, validated and accurate objective measure of sitting (activPAL) as the reference measure, in contrast to most previous research which used measures of low movement (such as the ActiGraph) rather than postural sensors [8];
-
the use of taxonomic systems [3, 8] to provide comparative validation between measures within one validation study, allowing extrapolation of the results to all self-report measures of sedentary time;
-
the sample size (n = 700), which is larger than many validation studies published to date [8], and high compliance within the study (92% of participants agreeing to take part provided data for a full 7-day period).
The main limitation to this study is the lack of objective detection of waking time. The analysis relies on waking day data and this is ascertained using a sleep diary. While these are generally considered reliable and valid [23], they are not free of bias and error. Consequently, they may have degraded the quality and accuracy of the reference measure. Although automated methods to detect sleep show promise [24, 25], they do not currently offer a sufficient advantage over a sleep diary.
Finally, the results should be interpreted with care because they are based on a sample of older adults from three ongoing cohorts, only some of whom were employed, and might not be directly generalisable to self-report assessment in children or adult populations, for different cultural contexts, or in those perhaps less interested in their health than those who volunteer for repeat data collections within an ongoing research cohort. While the systematic approach taken in this comparative validation process should provide generalisable information for all self-report tool, replication of this process would provide definite proof of consistency of the findings.
Future
The results show that it is unlikely that great improvements in accuracy can be gained by developing new questionnaires or adapting existing ones. The heteroscedasticity (the variability of a variable is unequal across the range of values of a second variable that predicts it) present in several of the self-report measures suggest that part of the error is not entirely random and might have some deterministic sources. This suggests that individual answers could be corrected with some calibration equations using respondent characteristics. However, a recent study found that calibrating a single item direct measure of total sitting measured on 183 blue-collar workers based on a prediction model using standard demographic information only improved accuracy by 10 to 30% [26]. Future research should certainly consider exploring calibration of data, however this may lead to overfitting the data or increased burden. Another potential route for improvement might be using adaptive testing and presenting type of assessment and recall period tailored to the individual respondent.
There is an increasing interest in studying SBs in more detail and context is seen as key [11], so more questionnaires using composite assessment are appearing [27]. These composite measures are a trade-off. They provide information about time spent sedentary in specific behaviours or domains and still enable an estimate of total sedentary time to be made. However, the subscales within these composite measures are never validated, so the quality of the information on time spent in specific domains or behaviours is really unknown. Additionally, as seen within this study, the measurement characteristics of these sums to assess total sedentary time are inferior to other type of assessments. In the future, specific validation of sub-scales to ascertain their individual validity should be performed against an appropriate reference measure other than total sedentary time.