What is standardisation?

Technically, ‘standardisation’ is the process used in psychometric test development to create norms so that the performance of students of different ages can be represented by means of scores that are independent of age. However, the term ‘standardised’ is sometimes used in a non-technical sense to refer to the consistent administration of a test – i.e. that test instructions and methods of administration are the same for all who take the test. Because this non-technical usage can be misleading (e.g. users may assume that a test has standardised norms when in fact it hasn’t) Lucid only uses the terms ‘standardisation’ or ‘standardised’ in strict accordance with technical psychometric usage.

The most common normative scores are standard scores and centile scores. Standard scores have a mean (average) of 100 and a standard deviation2 (abbreviated to SD) of 15. Centile scores (sometimes known as percentile scores) place individuals on a ‘ladder’ of attainment from 1 to 100 compared with the population of that age; e.g. a centile score of 70 means that 70% of people would have lower raw scores and 30% would have higher raw scores. (For further information about standard scores and centile scores see Section 3.2).

Eleven schools were recruited for the standardisation process. The schools were selected to include the age range of 7-16 years, and including both urban and rural schools representing a range of socio-economic backgrounds. In their most recent Ofsted report three of the schools had been rated as outstanding, and six had been rated as good or satisfactory. The remaining two schools had been in special measures within the last three years. The proportion of students eligible for free school meals was at or lower than the national average in four of the schools, and above the national average in seven of the schools. The proportion of students with special educational needs was above the national average in four of the schools, and either at or lower than the national average in the remaining seven. Students were taken on an unselected basis from entire classes of students in the participating schools. No students were excluded from taking part on any basis.

2 The standard deviation is the most common statistic for expressing variability in a set of scores and is calculated as the average amount by which the scores in the set deviate from the mean.

Standardisation sample

The standardisation sample comprised 1087 students aged 7-16 years (502 males and 585 females) [see Table 1].

Standardisation results

All raw data from the three tests and also the two derived measures (Working Memory Composite and Processing Speed) approximated to normal distributions (symmetrical bellshaped curves), with skewness (the degree of asymmetricality of the distribution) and kurtosis (the degree of flatness and peakedness of the distribution) below the critical threshold of 1.0. Descriptive statistics for each of the core tests are given in Table 2, and for the two derived measures in Table 3. For the three core tests and Working Memory Composite the developmental progression in raw score means from the youngest to the oldest age group is approximately linear with the exception of the 16:0-16:11 age group. From Table 1 it can be seen that the number of students in the 16:0-16:11 age group was significantly smaller than the other groups, and this is the most likely explanation for the divergent results pattern found in this group. For Processing Speed, the curve is approximately linear in the range 7:0-12:11, but plateaus thereafter, as might expected with a speed measure.


Table 1. Number of students in the standardisation sample by age.

Table 2. Raw score means and standard deviations for the tests in Lucid Recall by age.

The overall breakdown of data was considered appropriate for standardisation in 6-month age bands; however, norms in the age range 16:0-16:11 should be regarded as provisional for the time being because the number of students in this age range fell below psychometric conventions. The norms for this age were adjusted using extrapolated scores from the development curve for ages 7:0-15:11. Further standardisation data are being collected with a view to revising the norms for age 16:0-16:11 as soon as possible.

The distributions of raw scores for memory span and for average time did not permit calculation of standardised scores, because kurtosis (in the former) and skewedness (in the latter) exceeded acceptable limits. Given the nature of these particular measures (i.e. memory span and average time) these statistical findings are entirely to be expected and the overall psychometric integrity of Lucid Recall is not affected. Consequently, comparative results for these measures are provided instead, as already explained in Section 1.2.2.


Table 3. Means and standard deviations for the two derived measures in Lucid Recall.

Gender differences

Gender differences were examined for each working memory task. There was a significant effect of gender on scores on pattern recall in favour of females [F(1,865)=7.87, p=0.005], but this had a small effect size (partial eta squared=0.009). On the word recall test a gender difference favouring females almost reached significance [F(1,832)=3.83, p=0.051] and again the effect size was small (partial eta squared=0.005). There were no significant effects of gender on scores on counting recall. Overall, it was concluded that gender differences on these tests are small.