‘Reliability’ generally refers to the extent to which a test can be expected to give the same results when administered on a different occasion (test-retest reliability) or to which the components of a test give consistent results (internal consistency).
Internal consistency is a measure of whether each item in a test measures the same concept. There are several methods of calculating this, although the most commonly used is Cronbach’s alpha, which is based on the ratio of the sum of the individual item variances to the overall subtest score variance. However, Cronbach’s alpha presumes a complete set of responses to the items, since all items need to contribute to the factor score equally, which is not case with all the LASS 8–11 subtests. An alternative formula is the standardised Cronbach’s alpha (shown below), which is based on the average non-redundant item correlation.
Table 10 shows the standardised Cronbach’s alpha estimates (note that for Mobile phone, Funny words and Word chopping, the calculations are based on 8–10-year-olds only, as 11-year-olds are included in the LASS 11–15 sample for these subtests). An internal consistency of > .7 is generally considered to be adequate, whilst > .8 is deemed as good, and > .9 as excellent. It can be seen from Table 10, that Spelling shows an excellent level of internal consistency, with the majority of the remaining subtests showing a good level, and a few at an adequate level. Mobile phone is showing a lower level of internal consistency due to the strict discontinuation rule on this particular subtest (whereby the test stops when the student fails both items at a level – similar to other digit span tests). However, a normal Cronbach’s alpha calculation (based on the remaining more difficult items being failed after discontinuation) estimates the internal consistency on this subtest as .831.
Table 10. Internal consistency
Test-retest reliability estimates the degree to which a test provides stable measurements over time. A small subset (n = 120) of the LASS 8–11 standardisation sample repeated the LASS 8–11 subtests 4-6 weeks after the first administration. Correlations (using Pearson’s r) between scores on the two sittings are given in Table 11. A correlation of .60 is considered to be an adequate level of test-retest reliability, with .70 considered as good, and .80 as excellent. As can be seen in Table 11, Spelling shows an excellent level of test-retest reliability, with Sentence reading and Single word reading showing good levels. The remaining subtests are mostly within or around the acceptable level, although Sea creatures (visual memory) is a little below. Earlier research on LASS 11–15 also found lower correlations on the memory subtests than on the literacy subtests, which appeared to be due to greater susceptibility of these tasks to practice effects arising from enhanced motivation and application of strategic thinking at the retest.
Table 11. Test-retest reliability