Study design and regulatory information
CADENCE-Adults was a multi-year laboratory-based, cross-sectional study registered with ClinicalTrials.gov (NCT02650258) and designed to determine cadence (steps/min) thresholds associated with physical activity intensity across the adult lifespan (i.e., 21–85 years of age) [6,7,8]. Data collection took place at the University of Massachusetts Amherst in three phases: January to October 2016 for 21–40 year olds [8], January to October 2017 for 41–60 year olds [7], November 2018 to August 2019 for 61–85 year olds [6].
CADENCE-Adults was approved by the University of Massachusetts Amherst Institutional Review Board. After first phone screening to identify eligible participants, we scheduled an in-person screening evaluation where participants provided signed informed consent prior to beginning with data collection.
Participants
A sex- and age-balanced sample of 10 men and 10 women for each 5-year age category between 21–85 years (i.e., 21–25, 26–30, 31–35 years of age, …) was recruited [6,7,8], and a final sample of 260 individuals participated. This recruitment strategy was carried out with the aim of favoring minimization of sources of bias, improving generalizability of findings and ensuring an equal distribution of participants across the lifespan age range of this study. Recruitment strategies included newspaper and radio advertisements, e-mails, electronic postings, flyers, general recruitment events (i.e., retirement villages, assisted living centers, and community centers), and word-of-mouth. Interested individuals contacted us via telephone or email. We then phone screened them to determine eligibility based on our inclusion/exclusion criteria. Potential participants were re-screened again to confirm eligibility during an in-person visit before obtaining informed consent and prior to beginning any data collection. Exclusion criteria included: use of wheelchairs, walking aids or any impairment for normal ambulation; mental illness hospitalization in the 5 years previous to the data collection; pregnancy; current tobacco use; a stroke or any other cardiovascular disease; a body mass index (BMI) indicating underweight or severe obesity (BMI < 18.5 kg/m2 or > 40 kg/m2); stage 2 hypertension (≥ 160 mmHg systolic blood pressure or ≥ 100 mmHg diastolic blood pressure); use of pacemaker or similar implanted medical device; or any condition and/or use of medication that could alter physiological response to exercise. Our medical investigator reviewed a resting electrocardiogram to approve higher risk participants for exercise evaluation.
Treadmill testing procedure
Full details on treadmill testing procedures have been provided elsewhere [6,7,8]. Briefly, participants were asked to complete a series of up to twelve incrementally faster walking 5-min bouts on a level (0% grade) Cybex 751 T treadmill (Cybex International Inc, MA, USA). To facilitate collection and count of speed-specific steps from the tested wearable technologies, each bout was separated by a 2-min rest. Speed was verified using a tachometer and started at 0.8 km/h (0.5 mph) with subsequent increments of 0.8 km/h per bout to a maximum of 9.7 km/h (6.0 mph). The treadmill protocol was terminated at the end of the 5-min bout when the participant naturally transitioned from walking to jogging/running, achieved ≥ 75% of age predicted heart rate maximum, reported a Borg rating of perceived exertion > 13 [10] or if either the research staff or the participant decided not to continue for any reason (e.g., perceived fatigue or safety concerns).
Measures
Participant characteristics and anthropometric measures
Biological sex, age, and race/ethnicity were self-reported. Participants’ weight, height, leg length, waist circumference, and BMI were measured using standardized protocols as detailed previously [8].
Step counting
The criterion measure of steps taken was directly observed and hand-tally counted. The method for assessing treadmill stepping was rarely problematic, likely because this was the sole assignment of one research technician during the treadmill test but also because the steps taken were largely rhythmic and predictable (except for the very few steps taken at the beginning and end of a bout) and the observed movements were reinforced with the audible sound made when the foot hit the treadmill band. We also aimed a video camera at the participant's feet during the test to provide a redundant copy of the event for verification purposes as needed. Our practice was that when the responsible research technician self-disclosed miscounting or the value reported was immediately identified as unusual or unexpected (i.e., higher or lower than expected given the preceding bout and/or recorded bout speed), the step count for that particular bout was verified and corrected as needed using the video file immediately following the testing session. During analysis, we had the opportunity to again examine rare cases of anomalous values (including questionable results compared to associated outputs from the multiple wearable technologies) by recounting steps on the video. If a discrepancy was found between the original logged value and the second viewing of the video, a third viewing was used to finalize the criterion value. We emphasize that this process was rarely required.
As mentioned above, data were collected over multiple years, and during this time, some wearable technologies were discontinued while others were updated and/or new ones became available. As a result, the exact number and description of devices differs somewhat between age groups. Ultimately, 21 different devices were evaluated over the full period of data collection. See Additional file 2: Suppl Fig. 1 and Suppl Table 1 for visual and tabular description of device locations, settings, distribution among age groups, and initialization and data extraction procedures: StepWatch (OrthoCare Innovations, Seattle, WA, USA) on the right ankle; an activPAL (PAL Technologies Ltd, Glasgow, UK) on the right thigh; an Actical (Philips Respironics, Murrysville, PA, USA), ActiGraph GT9X (ActiGraph, Pensacola, FL, USA), GENEActiv (Activinsights Ltd, Cambridgeshire, UK), New Lifestyles NL-1000 (New Lifestyles Inc., Lee’s Summit, MO, USA) and Fitbit One (Fitbit Inc, San Francisco, CA, USA) on the right waist, and a Digi-Walker SW-200 (Yamax Corporation, Tokyo, Japan), Fitbit Zip and PiezoRx (StepsCount, Ontario, Canada) on the left waist; an ActiGraph GT9X, Garmin vivoactive 3 (Garmin International Inc., Olathe, KS, USA), Garmin vivoactive HR, Garmin vivofit 2, Garmin vivofit 3, GENEActiv and Polar M600 (Polar Electro Oy, Kempele, Finland) on the non-dominant wrist, and an Apple Watch Series 1 (Apple Inc., Cupertino, CA USA), Fitbit Ionic (Fitbit Inc, San Francisco, CA, USA), Samsung Gear Fit2 (Samsung Electronics America Inc., Ridgefield Park, NJ, USA) and Samsung Gear Fit2 Pro on the dominant wrist.
Data processing and aggregation
The Apple Watch Series 1, Digi-Walker SW-200, NL-1000, PiezoRx, Polar M600, and all Fitbit, Garmin and Samsung devices displayed step count data in real-time that was manually recorded at the end of each bout. For the waist- and wrist-worn GENEActiv, we used the step detection algorithm that we recently published [11]. The Actical, ActiGraph GT9X, activPAL, GENEActiv, and StepWatch recorded steps automatically time-stamped according to internal functioning. These data were downloaded according to manufacturers’ specifications as detailed in Additional file 2: Suppl Table 1. Specifically, the time-stamped step count data were synchronized to the study protocol’s digital timing record to facilitate post-processing of bout-specific step counts. Therefore, each wearable technology was managed to provide a total number of steps per bout, and these, along with the directly observed step data, were merged into a single comma-delimited flat file for further analysis.
Analytic sample
The final analytic data set included 258/260 originally recruited participants after removing data from two women (84.5 ± 0.7 years of age) whose participation was terminated due to safety concerns identified as unsteadiness during treadmill ambulation. Sample sizes linked to each wearable technology varied due to the fact that some devices were worn by all age groups while some others were only available (and thus worn) in specific age groups over the multiple years of data collection of the original study. Further, some individual devices malfunctioned and therefore these specific data were lost. A full description of sample sizes and number of steps derived from direct observation and each tested wearable technology at each treadmill speed by age group is provided in Additional file 3.
Ultimately, the sample of 258 participants provided 1,842 treadmill bouts, with 30 of which being running bouts. Following the same procedures established in the previous catalog based on the CADENCE-kids study [12], we decided to exclude running bouts from this analysis for three specific reasons: 1) the lack of robustness of the sample size providing these bouts (running bouts represented only the 1.6% of total bouts); 2) the speeds at which people actually ran varied from 4.8 to 8.8 km/h, making conclusions challenging about any specific speed); and, 3) the well-known biomechanical differences between running and walking [13]. Thus, the final analytical data set of 258 participants comprised a total of 1,812 treadmill walking bouts ranging from slow to fast speeds. The data set and the corresponding data dictionary were formatted in accordance with the previously published catalog [9] and are available in Additional file 4.
Statistical analysis
Descriptive statistics
Sample characteristics are presented as means and SDs or percentages (%), as appropriate. We previously defined and rationalized validity indices related to accuracy, bias, and precision [9] yet are briefly reviewed again here. Accuracy was determined using MAPE, calculated as follows [4].
$${E}_{j}={W}_{j}- \, {C}_{j}$$
$$\mathrm{MAPE}= \frac{100\%}{n}{\sum }_{j=1}^{n}\frac{\left|{E}_{j}\right|}{{C}_{j}}$$
where Wj is the number of steps recorded by the device being tested in the jth person-bout (j = 1, 2, …, n), Cj is the criterion measure of directly observed steps in that same person-bout, and Ej is the corresponding step count error expressed in absolute terms.
Bias was represented as MPE, calculated as follows [14]:
$$\mathrm{MPE}= \frac{100\%}{n}{\sum }_{j=1}^{n}\frac{{E}_{j}}{{C}_{j}}$$
By dividing the difference in steps derived from wearable technology and the directly observed steps (Ej) by the directly observed steps (Cj), the result is a scaled index that explains the difference, regardless of the total number of steps taken.
Precision indices were: SD, CoV and correlation coefficient (r) [15]. SD of error values (E) was calculated as follows:
$$\mathrm{SD}=\sqrt{ \frac{1}{{\text{n}}}{\sum }_{{\text{j}}= \text{1}}^{\text{n}}({\text{E}}_{\text{j}}-{\overline{E })}^{2}}$$
CoV was calculated as:
$$\mathrm{CoV}=\left(\frac{\mathrm{SD}}{\overline{E} }\right)\times 100\%$$
where SD represents the wearable technology’s variance in steps, and \(\overline{E }\) is the average of errors. Finally, the Pearson correlation coefficient (r) representing the strength of the relationship between directly observed steps and steps derived from wearable was computed accordingly:
$$r=\frac{{\sum }_{j=1}^{n}{(W}_{j}-\overline{W }){(C_{j}-\overline{C })}}{\sqrt{\left[{\sum }_{j=1}^{n}{(W_{j}-\overline{W })}^{2}\right]\left[{\sum }_{j=1}^{n}{(C_{j}-\overline{C })}^{2}\right]}}$$
where Wj is the wearable technology’s number of steps being tested in the jth person-bout (j = 1, 2, …, n), and Cj is the observed steps in that same person-bout.
Again, following the procedures established in our previously published children/youth catalog [9], MAPE (accuracy) and MPE (bias) values, with their associated SD and CoV (precision) values, were averaged across the available samples for each wearable technology, and presented for each walking speed, speed level (i.e., slow speed level = 0.8, 1.6, 2.4, and 3.2 km/h; normal speed level = 4.0, 4.8, 5.6, and 6.4 km/h; and fast speed level = 7.2 and 8.0 km/h), wear location (ankle, thigh, waist, and wrist), and age group (young adults, 21–40 years; middle-age adults, 41–60 years; and older adults, 61–85 years). Correlation coefficients (r) were computed for the whole sample and reported across all walking bouts as these required a wider range of step counts to provide meaningful results. To classify speed levels, we defined slow and fast relative to (and accepting of) the Consumer Technology Association (CTA) description of a normal speed range [5]. Interpretation of validation indices adhered to accepted conventions. For example, the lower the MAPE the better the accuracy. Similarly, the closer the MPE values to 0% the better the bias. Lower SDs and CoV were interpreted as better precision. Also, correlation coefficients closer to 1 indicated better precision.
Inferential analysis
The effects of speed, wear location and age group on accuracy, bias, and precision were tested via mixed effect models. First, we tested the effect of speed on MAPE by fitting a set of 21 mixed effects models for each of the 21 tested wearable technologies. Thus, the MAPE for participant i = 1, 2, …, N at speed j = 1, 2, …, q (inserted in the model as a categorical variable), conditional on their participant-specific deviation, was estimated for each device as follows:
$$E[{Y}_{i}|{b}_{i}]={{\varvec{X}}}_{{\varvec{i}}}\beta +{b}_{i}$$
where Yi is a q × 1 vector of absolute percentage error values, Xi represents a q × q diagonal matrix of dummy variables (i.e., equal to 0 or 1) indexing the corresponding speed, β is a q × 1 vector of regression coefficients for the fixed effect (i.e., speed as categorical variable), and bi represents the random intercept for a participant i. To test the effect of speed (β) on MAPE, likelihood ratio tests (α = 0.05) were used for each wearable technology-specific model. We also estimated 95% CIs of MAPE at each speed. Congruent with the direction of our previously published approach [9] and with previous indications [16], 95% CIs were interpreted as significantly different when they did not overlap with another point estimate. When the CIs overlapped, statistical significance was not clear. Another valid approach would be to construct CIs around the differences. However, we chose not to do that because the statistically unclear differences were practically small, irrespective of statistical significance. We used the same mixed model analysis to examine the effect of wear location and age group. To do so, we substituted for Xi and refitted the model separately for each of the three speed levels (i.e., slow, normal, and fast). For example, to test the effect of age group on MAPE for each of the speed levels, we treated Xi as a diagonal matrix of dummy variables (equal to 0 or 1) corresponding to age-speed combinations. Main analyses of the present study were performed and are presented for wearable technologies’ MAPE since accuracy reflects both bias and precision as it accounts for the overall performance of a step counting device [15]. Additionally, all mixed model analyses were used to examine the effects of speed, wear location and age on bias (MPE) and precision (r) and are presented as supplementary material. All analyses were performed using R-Studio (version 3.0.2, R Foundation for Statistical Computing, Vienna, Austria).