‘Reliability’ generally refers to the extent to which a test can be expected to give the same results when administered on a different occasion (test-retest reliability) or to which the components of a test give consistent results (internal consistency).

Internal consistency is a measure of whether each item in a test measures the same concept. There are several methods of calculating this, although the most commonly used is Cronbach’s alpha, which is based on the ratio of the sum of the individual item variances to the overall subtest score variance. However, Cronbach’s alpha presumes a complete set of responses to the items, since all items need to contribute to the factor score equally, which is not case with all the CoPS subtests. Therefore, the formula used is the standardised Cronbach’s alpha (shown below), which is based on the average non-redundant item correlation.

Table 12 shows the standardised Cronbach’s alpha estimates for 4–6-year olds and 7-year-olds – these are given separately due to different test items being delivered to the two age groups (except for Clown). An internal consistency of > .7 is generally considered to be adequate, whilst > .8 is deemed as good, and > .9 as excellent. It can be seen from Table 12 that Toybox shows an excellent level of internal consistency, with the majority of the remaining subtests showing a good level, and a few at an adequate level. Letter names is showing a lower level of internal consistency due to the limited number of items on this particular subtest.


Table 12. Internal consistency

Test-retest reliability estimates the degree to which a test provides stable measurements over time. A small subset of the CoPS standardisation sample (n = 80) repeated the CoPS subtests 4-6 weeks after the first administration. Correlations (using Pearson’s r) between scores on the two sittings are given in Table 13. A correlation of .60 is considered to be an adequate level of test- retest reliability, and .70 considered as good. As can be seen in Table 13, Rhymes shows a good level of test-retest reliability. The remaining subtests are mostly within or around the acceptable level, although Letter names and Wock are a little below. This may be due to the limited number of items, and an enhanced practice effect (due to prior exposure to the symbol-name pairings), on Letter names and a slight ceiling effect on both of these subtests.


Table 13. Test-retest reliability