Skip to main content

Table 1 Description of terms for validity and reliability in PA and SB measurement

From: Should we reframe how we think about physical activity and sedentary behaviour measurement? Validity and reliability reconsidered

Validity – the extent to which a measurement is representative of the true scientific value; taking “true” to mean an exact representation of what happened, free from all possible sources of error or bias.

Test validity (or Construct validity) – a combined assessment of face, content, and [concurrent, convergent or criterion]validity for your measure within the desired or utilised study population.

 Face validity

The extent to which a measure looks like it will, or appears to, provide the desired information. Assessed by expert consensus and theoretical consideration.

Likewise for the proposed data processing and generation of outcome variables. Assessed by expert consensus and theoretical consideration.

 Content validity

The extent to which a measure covers all aspects of the intended behavioural or physiological domains or dimensions (see Fig. 1). Assessed by examination of domain or dimension of interest.

Likewise for the proposed data processing and generation of outcome variables.

 Convergent validity

The extent of the agreement with another (non-criterion) measure that should assess the same PA or SB parameter based on face and content validity. Assessed quantitatively.

Useful when the criterion is very resource intensive.

This approach also allows assessment of whether the measures can be used interchangeably, or the data from the two measures pooled or otherwise compared.

 Criterion validity

The extent of the agreement between a measure and another already held as being a criterion or gold standard. Assessed quantitatively. Called absolute validity when compared to measure known to provide perfectly true values.

 Concurrent validity

Assessment of convergent or criterion validity when measures taken at same time.

 Predictive validity

Assessment of convergent or criterion validity when measures taken at different times.

Experimental validity – a combined assessment of internal and external validity to determine whether conclusions drawn from the data are free from bias and generalizable to wider populations.

 Internal validity

The extent to which conclusions drawn from the experimental data are free from confounding issues which cause bias such as reactivity and missing data; similar to methodological quality. Assessed by examination of relevant issues.

 External validity

The extent to which conclusions drawn from the data are generalizable to the wider populations. Assessed by examination of age, sex, ethnic origin, socio-economic status, etc., of study sample.

This could be assessed by a theoretical justification or empirical demonstration such as field testing and small scale “proof of concept” studies. These should assess participant feedback (e.g. satisfaction and burden) as well as data issues (e.g. can meaningful information be produced in reasonable time frames?)

Reliability – the extent to which a tool gives measurements that are consistent, stable, and repeatable.

 Test-retest reliability

The extent to which test scores are consistent from one test administration to the next; keeping as many conditions (e.g. researcher, timing, preparation, etc.) as possible unchanged. Assessed quantitatively.

This estimate incorporates any factors that cannot be controlled e.g. intra-rater reliability, behaviour change, etc.

 Inter/intra-rater reliability

The extent to which test scores are consistent when measurements are taken by different people using the same methods (inter-rater) or at different times by the same person (inter-rater). Assessed quantitatively.

 Inter/intra-instrument reliability

The extent to which test scores are consistent when measurements of the same thing are taken by different versions of the same instrument (inter-instrument) or repeatedly by the same version of an instrument (intra-instrument). Assessed quantitatively.

 Behavioural reliability

The extent to which stability in behaviour has been considered when assessing other aspects of reliability.

  1. Note: We are not attempting to deliberately re-define any term here; if we use one here that you think we have described incorrectly we suggest this is more evidence for non-standard use of terms and further justification for the need of this framework. Multiple sources used