Understanding CAT4 results

If at any stage you have a question about CAT4 data or your CAT4 results, please contact our Assessment Insights team at [email protected]. Our Assessment Insights team are assessment data experts with over 35 years combined classroom experience, here to help you make the best use of your data to inform in-school actions. You are also welcome to book a free one-to-one data consultation where the team will review your data in advance, provide an objective summary of key findings and help you to identify next steps.

A student’s CAT4 results provide a detailed and objective analysis of their reasoning abilities. The results can identify strengths and weaknesses, what these might reveal about the student’s learning and then indicate which learning strategies might be most effective. When teachers have an awareness of both the strengths of an individual and the abilities demanded by a particular task, learning will be most successful.

The Standard Age Score (SAS) is the most important piece of information derived from CAT4. The SAS is based on the student’s raw score which has been adjusted for age and placed on a scale that makes a comparison with a nationally representative sample of students of the same age across the UK. The average score is 100. The SAS is key to benchmarking and tracking progress and is the fairest way to compare the performance of different students within a year group or across year groups.

Schools that are based outside of the UK use SAS that are based on UK standardisation. Many schools follow a UK curriculum and their students take external assessments such as the GCSE or IGCSE, and for these it is important to know how their cohorts compare to students taught the same curriculum within the UK. The skills assessed by CAT4 are independent of a taught curriculum and can therefore be applied to an international context.

The SAS for each of the four batteries are given separately in the Individual Report and are averaged to give the mean score. When thinking about learning strategies, it is particularly important to focus on students’ scores in the four CAT4 batteries rather than on their mean CAT4 score.

Understanding the difference between CAT4 Level X & Y standard age score reporting and the other CAT4 levels

When comparing standard age scores (SAS) between CAT4 Level X & Y and the other levels of CAT4, it is important to note that whereas the range of scores from levels A to G is from 59 to 141, the range of scores for levels X & Y ranges from 69 to 131.

There is a simple explanation for this difference. CAT4 levels X and Y are designed to test younger students who typically have a shorter attention span than older students. As a result, they are administered in two parts of 30 minutes rather than in three parts of 40 minutes. This change in design allows us to create a test that is appropriate for these younger students, but with fewer questions and less data it is not possible to reliably differentiate between extremely strong scores above 131 and extremely weak scores below 69. Note that only the bottom 2% and top 2% of children’s results will have standard age scores that are affected by the different limits. In addition, these children will receive the same Stanine and almost all will receive the same National Percentile Rank (NPR)

Figure 2 Distribution of standard age score, percentile rank (PR) & standard deviation from mean

Explaining the Standard Age Score

Explaining National Percentile Rank (NPR)

The National Percentile Rank (NPR) relates to the SAS and indicates the percentage of students obtaining any particular score. An NPR of 5 means that the student’s score is within the lowest 5% of the national sample; an NPR of 95 means that the student’s score is within the highest 5% of the national sample; an NPR of 50 is average.

Explaining Stanines

The Stanine (ST) places the student’s score on a scale of 1 (low) to 9 (high) and offers a broad overview of his or her performance.

The Group Rank (GR) shows how each student has performed in comparison to those in the defined group, such as the class or year group. The symbol ‘=’ represents joint ranking with one or more other students.

Performance on a test like CAT4 can be influenced by a number of factors, and the confidence band is an indication of the range within which a student’s score lies. The narrower the band, the more reliable the score. This means that a 90% confidence band is a very high-level estimate. The dot represents the student’s SAS and the horizontal line represents the confidence band. The yellow shaded area shows the average score range.

For CAT4, the confidence bands are typically plus or minus five standard score points around the student’s actual SAS. These confidence bands are important in order to prevent us from over-interpreting small differences in scores. For example, if a student scored 95, and was retested some months later and scored 98, the second score is well within the confidence band for the first score and so does not represent a significant change. The confidence bands are also important when it comes to identifying significant differences between a student’s scores on the four batteries. However, they vary depending on the CAT4 level taken, the particular battery and the absolute level of the score. For example, the confidence bands for high and low scores will tend to be wider where they are going towards the national mean (100).

The number of questions attempted can be important: a student may have worked very slowly (but accurately) and not finished the test and this will impact on his or her results.

An example CAT4 Individual report

The report shows the level of scores in each battery. In CAT4, battery is the title given to each of the four pairs of tests which assess different aspects of ability (see pages 5-6). An example is given below.

The profile for Zaynab Ashfaiq shows the number of questions attempted for each battery, her standard age score (SAS), national percentile rank, stanine and group rank for each battery. Zaynab’s SAS are 95, 101, 115 and 116 respectively for each battery, placing her in stanines 4, 5, 7 and 7, and at the 37th, 52nd, 84th and 86th percentiles respectively. Zaynab attempted all of the 48 verbal questions, 24 of the 36 quantitative questions, all of the 48 non-verbal questions and all of the 36 spatial questions.

The row of text beneath the four sets of battery scores gives the student’s mean CAT4 score: in this case, 107. This is derived by summing the student’s scores over all four batteries taken and dividing by the number of batteries taken – that is, (95 + 101 + 115 + 116) / 4 = 107.

The report also presents the SAS in a graphical format on a scale ranging from 60 to 140. The student’s actual SAS is indicated by a black dot.

There is a horizontal line either side of the SAS dot, which indicates the 90% confidence band. Any test score is generated from a performance on a particular day. We know that CAT4 is a highly reliable test, but nevertheless we can expect scores to fluctuate or change to some extent due to chance factors. The confidence band indicates the range in which a student’s score would be expected to fall on 9 out of 10 test occasions.

Analysing CAT4 profiles

CAT4 Individual reports also assign to the set of results one of seven broad descriptions of the student’s abilities, as well as populating a narrative which provides:

  • For teachers: a summary of the student’s likely strengths and weaknesses, and implications for teaching and learning.
  • For students: a summary and a set of probing questions and suggestions.
  • For parents: a set of suggestions for what this means for the student.

The Verbal Reasoning and Spatial Ability Batteries form the basis of this analysis and the profiles are expressed as either mild, moderate or extreme bias for verbal or spatial learning, or, where no bias is discernible (that is, when the scores from both batteries are similar), an even profile across the two batteries.

The seven broad descriptions of ability are:

  • Extreme verbal bias
  • Moderate verbal bias
  • Mild verbal bias
  • No bias
  • Mild spatial bias
  • Moderate spatial bias
  • Extreme spatial bias.

The most common profile for students to receive is the ‘no bias’ profile, since abilities in verbal and spatial are correlated to a surprisingly high degree. That is to say that students who score well on the Verbal Reasoning Battery are likely to perform well on the Spatial Ability Battery, and students who perform less well on one are likely to perform less well on the other

Students also tend to move to a less extreme profile over time. Data from students who take CAT4 twice, two years apart, suggest that students are unlikely to retain the same profile over this time period unless it is the ‘no bias’ profile.

This is consistent with other tests of cognitive ability, not just CAT4, and is part of the reason why contrasting results across batteries, and asking questions about one’s understanding of the student, is so informative because differences are so unexpected.

Figure 3 Breakdown of bias profiles

  • Extreme verbal bias [orange] 2%
  • Moderate verbal bias [lighter orange] 4%
  • Mild verbal bias [yellow] 11%
  • No bias [white] 66%
  • Mild spatial bias [green] 11%
  • Moderate spatial bias [blue] 4%
  • Extreme spatial bias [purple] 2%

The implications of the student’s profile for teaching and learning will depend on both the pattern of scores (strengths and weaknesses) and the overall level of the student’s scores (relative to the average or expected score). An estimate of the overall level is captured by the mean CAT4 score. In general, the mean CAT4 score carries:

  • the most information for no bias;
  • less information for profiles with moderate and mild bias;
  • still less information for profiles with extreme bias.

Therefore, when the teacher is asked to consider the overall level of scores, the mean CAT4 score will provide only a rough guide. For profiles with extreme bias in particular, you should consider the level of the scores on the individual battery or batteries most relevant for the profile. This is particularly true in circumstances when the student’s level of English language proficiency might be affecting one battery more than another (see page 79).

In the group reports, each student’s results are plotted as a point on a two-dimensional grid, with spatial SAS and stanine running horizontally left to right, and verbal SAS and stanine running vertically bottom to top. Each profile is displayed as a coloured area of the resulting grid, with a dashed line going through the diagonal representing absolutely no bias between the two abilities.

Students’ results are plotted in this grid as dots, which display each set of results in proportion to their results and to the grid.

Assessment Insights support

If you have a question about CAT4 data or your CAT4 results, please contact our Assessment Insights team at [email protected]. Our Assessment Insights team are assessment data experts with over 35 years combined classroom experience, here to help you make the best use of your data to inform in-school actions. You are also welcome to book a free one-to-one data consultation where the team will review your data in advance, provide an objective summary of key findings and help you to identify next steps