Study design and population
We conducted a cluster-randomized controlled trial with a total of 40 elementary schools (20 intervention; 20 control) from a large, suburban school district in the US state of Georgia. The study aimed to follow students over a two-year intervention period including Grade 4 Fall (Fall 2018; “T1”), Grade 4 Spring (Spring 2019; “T2”), Grade 5 Fall (Fall 2019; “T3”), and Grade 5 Spring (Spring 2020; “T4”), though study activities ended midway through T4 in March 2020 due to the onset of the COVID-19 pandemic. The evidence-based Health Empowers You! intervention [22, 23] was implemented across the entire study period from September 2018 to March 2020 with the goal of sustainably elevating students’ school-day MVPA. The intervention also ensured some students experienced higher MVPA levels closer to the recommended 30 minutes of school-day MVPA, allowing for more rigorous assessment of the relationship between school-day MVPA and academic achievement. Health Empowers You! is a multi-level intervention designed to shift school practices and culture to increase elementary school students’ levels of school-day PA. Trained Physical Activity Specialists (PASs) provided training and technical assistance to teachers to implement the PA intervention. Teachers received various resources to increase school-day PA, including web content, weekly calendars outlining PA resources and strategies, monthly training webinars, and exercise equipment.
Power was calculated using simulation and a Bonferroni correction to the alpha level of 0.05 given multiple hypotheses, yielding an adjusted alpha of approximately 0.0003. Specifying an unconditional intraclass correlation coefficient (ICC) of 0.25 (across the school and teacher levels) and a standardized effect size of 0.25 between PA and academic achievement based on meta-analytic reviews of the relationship, 40 schools with 6 teachers per school and 20 students per teacher gave a power of 100%.
For school recruitment, the school district provided demographic data for all the district’s elementary schools, including number of Grade 3 classes, mean number of students per Grade 3 class, racial/ethnic composition of the student body, and socioeconomic status (SES) of students’ families, which was proxied by the percentage of students who were eligible for free or reduced-priced lunch (FRL). Amount of monthly PE time at each school was also accounted for in randomization based on information from school district administrators on PE class scheduling.
To ensure that both higher SES and lower SES schools were sampled, 20 schools each were randomly selected from among the districts’ higher SES stratum schools (less than 50% of students eligible for FRL) or the lower SES stratum schools (50% or more of students eligible for FRL). Within each stratum, it was confirmed that the demographics of the 20 selected schools were comparable to the demographics of all schools in the stratum.
The 40 selected schools were then randomized to intervention or control using an urn procedure that adjusted the probabilities of allocation based on two key school-level characteristics: SES (based on FRL) and number of monthly minutes of PE scheduled for Grade 4 students. Once 20 schools were allocated to the intervention and 20 schools to control, demographic characteristics of the intervention and control groups were compared to confirm there were no statistically significant differences in characteristics between the intervention and control groups. All 40 approached schools agreed to participate in the project and accepted the condition randomization in January 2018.
All Grade 4 students not enrolled in a full-time special education classroom at participating schools at the beginning of the 2018-2019 school year were eligible for enrollment in the study. Special education teachers participated in training and received resources for implementation of the intervention at their discretion in the intervention schools, but students in special education classrooms were not included in data collection because these classes include multiple grade levels and required complex additional supports.
Information about the study was distributed to parents in August 2018 with facilitation by the principal and office staff at all participating schools. Student informed consent agreements (available in English, Spanish, Vietnamese, and several other languages) were required from participating students’ parents/guardians. Enrollment in the study included providing parental consent and student assent for participation in PA measurement via accelerometry and authorizing the school district to share archival records on standardized test scores, teacher-assigned grades, attendance, and tardiness as part of the analytic data set provided to the research team. Of 6525 Grade 4 students across the 40 schools, 4936 (76%) were enrolled in the study. Of the 4936 students, 4320 (87.5%) had a valid accelerometer measure in T1, 3800 (77.0%) in T2, and 3588 (72.7%) in T3.
The school district administration, district IRB, and Emory University IRB (IRB00095600) approved this study. School district leadership, school leadership, and teachers were extensively involved in the study’s implementation process. The school district research department reviewed and approved the proposal and study design, principals were engaged in recruitment and scheduling trainings, and district-level administration provided data and ensured smooth implementation that would not overburden schools. The Health Empowers You! intervention also has teachers and school administrators design a unique school activity plan that meets their school’s specific needs.
Additional details about the study’s protocol are provided in a previous manuscript [24].
Data sources
Data sources include: (1) school district records of student academic and demographic data, and (2) ActiGraph wGT3X-BT 3-axis accelerometers (ActiGraph LLC, Pensacola, FL), attached on a waist belt. Students put on their assigned accelerometers at the beginning of the school day and removed them before leaving school. ActiLife software was used to download and score the data, and filter to only school-day minutes for scoring. Non-wear time was defined as 60 consecutive minutes of zero counts, allowing for up to 2 minutes of counts between 0 and 100 [25]. Data were collected in 15-second epochs and scored using Evenson activity threshold cut points [26].
Measures
Exposure
Accelerometer-measured PA was the primary exposure. Criteria for a valid day required students to wear the accelerometer for at least 80% of the school day. Students needed at least 3 valid days of wear time during the 5-day measurement period each semester to be included in analyses for that semester. A single measure of mean daily MVPA minutes was calculated in each semester for students who met the 3-day criteria. After excluding students with insufficient accelerometer data, students included in the analysis had an average of 4.58, 4.23, and 4.52 valid days of accelerometer wear (range 3-5 days) for T1, T2, and T3, respectively, and an average 98, 96, and 98% mean daily wear time (range 82-100%, 82-100%, and 84-100%) for T1, T2, and T3.
Primary outcomes
Course grades and Grade 4 standardized test scores were examined as outcomes. Teachers assigned course grades each semester (T1, T2, and T3) for math, reading, spelling, and writing on a 100-point scale. The Georgia Milestone standardized test in English Language Arts (ELA) and math is administered each spring for students in grades 3 to 8. The test was first used in Georgia in 2015 and is designed to measure students’ knowledge and skills related to state-adopted content standards for each academic subject [27]. Participating students’ results from the Spring 2019 Grade 4 Georgia Milestones tests were used; participant math scale scores ranged from 394 to 715, ELA scale scores ranged from 357 to 775, and Lexile scores ranged from 190 to 1300. Course grades were not assigned and Georgia Milestones tests were not conducted in Spring 2020 due to the COVID-19 pandemic.
Covariates
School district data was used for student-level and school-level covariates. Student sex, race/ethnicity, physical/learning disability status, participation in special education courses, English language learner (ELL) status, FRL status, departmentalized teacher status, prior academic achievement, prior absenteeism, and prior tardiness were controlled for in all models. Student sex was either “male” or “female,” and student race/ethnicity was categorized as “Asian,” “Black,” “Hispanic,” “Mixed,” or “White.” Physical or learning disability, ELL, and FRL status were dichotomized as “yes” or “no.” Student FRL status was used as a proxy for SES. Students were eligible for FRL if their family household income was at most 185% of the federal poverty level [28]. Special education participation was incorporated as a variable ranging from zero to four based on the number of special education courses students were enrolled in across math, reading, spelling, and writing. Student prior achievement was defined as the previous year’s course grade or standardized test score, in accordance with the outcome assessed in analyses; for example, the analysis using Grade 4 Georgia Milestones math standardized test scores controlled for each student’s Grade 3 Georgia Milestones math standardized test score. Finally, a student’s prior absenteeism and prior tardiness were measured by percent days absent and tardy in 3rd grade. Some teachers were departmentalized, meaning students rotated between them and other teachers for core classes. The teacher level was not included in multi-level analyses because of student rotation across departmentalized teachers, and departmentalization entered the model instead as a student characteristic.
At the school level, analyses controlled for percentage of students who were female, Black, Hispanic, and receiving FRL, along with intervention or control status.
Statistical analyses
Two-level random-intercepts models [29, 30] were utilized to estimate the associations of interest to account for the loss of independence of observation when lower-level units (e.g., students) are observed within higher-level units (e.g., schools). After running models with MVPA measured continuously, models were run with students’ MVPA grouped into three categories of mean daily school-day MVPA: less than or equal to 15 minutes, greater than 15 and less than or equal to 30 minutes, and greater than 30 minutes. These categories allowed assessment related to the recommendation that students attain at least 30 minutes of daily MVPA during school hours [5], specifically evaluating the difference in achievement between the low-daily-MVPA group (the <= 15 minutes category) and the group approaching the recommended 30 minutes (the 15 to 30 minute category), and between the low-daily-MVPA group and the group exceeding the 30-minute recommendation (the 30+ minutes category).
The unconditional multilevel model was used to estimate the ICC. Generally, values of the ICC above 0.05 suggest violations to the independent observations assumption and justify multilevel procedures [31]. The conditional random-intercepts model was used for estimating associations of interest. Study outcomes’ ICCs ranged from 0.117 to 0.212, which indicate a lack of independence of observation across schools and confirmed the need for multilevel modeling.
Depending on the cross-sectional analysis, the outcome was (1) the teacher-assigned semester course grade, (2) the mean of Grade 4 teacher-assigned grades from fall and spring semesters (T1 and T2), or (3) Grade 4 standardized test scores (from T2). For longitudinal analyses, a residualized change score approach was used wherein the outcome was Grade 5 teacher-assigned grades (from T3) while Grade 4 teacher-assigned grades (averaged from T1 and T2) were included as a covariate. Prior achievement entered each model grand mean centered. For categorical variables, the reference group was white males not eligible for FRL. Standardized estimates of MVPA’s coefficient were found by multiplying the coefficient by MVPA’s standard deviation and dividing by the outcome’s standard deviation [32]. To account for multiple tests, a Bonferroni adjusted critical p-value of 0.00271 was utilized for statistical significance.
Variables were missing data either because students were not enrolled in the participating schools for the entire study or because their observation did not meet criteria for inclusion (e.g., their accelerometer wear-time did not meet the threshold to count as a valid accelerometer measurement). Multiple imputation accounted for missing data. Twenty imputed datasets were created using the multilevel multiple imputation program Blimp [33]. Implausible imputed values were set to variables’ upper or lower bounds. Descriptive statistics were run on the non-imputed data. Final estimates of fixed and random effects were calculated using Rubin’s rules [34].