“It can’t be Greater Depth – the spelling isn’t good enough.”
“This isn’t Greater Depth because it lacks control.”
“It’s a good piece of writing, but the tense is inconsistent, so it’s not Greater Depth.”
These are the conversations we are used to having when it comes to primary writing and whether it reaches the Greater Depth standard (GDS) or not. On the surface, these three statements seem straightforward. The primary writing framework requires students to spell most words correctly, to exercise an assured and conscious control of their writing, and to use verb tenses consistently. If a piece of writing doesn’t meet one of these standards, it cannot be Greater Depth.
One man’s greater depth…
Except, of course, it isn’t that simple. Take the statement ‘exercise an assured and conscious control over levels of formality, particularly through manipulating grammar and vocabulary to achieve this’. Reasonable people could clearly disagree about what that means. Consider the following sentence: ‘The devious spirit towered over the dead corpse.’ One teacher on Twitter said a moderator used this sentence as evidence of a student lacking control, because a corpse is obviously dead. But another moderator might argue that one small tautology should not affect the judgement of a piece of writing that does broadly display control.
What about statements to do with spelling most words correctly and using tense consistently? Surely these are more objective and easier to interpret? Not necessarily. What if a student repeatedly spells a very difficult word incorrectly? Does that disqualify them from getting Greater Depth? Does it mean that a student who spells everything correctly but uses much more basic vocabulary is more deserving of Greater Depth? What if a student writes a story about time travel that requires very complex manipulation of different tenses?
It could be argued that I am being pedantic, splitting hairs and coming up with hard cases that are not typical of most student writing. To see if I am, let’s look at the data. If the primary writing framework really is easy for teachers to understand and interpret consistently, then we would expect to see most teachers who use it agreeing in their judgements. The evidence we have suggests this is not the case.
Agree to disagree
At No More Marking, we carried out a study where several experienced markers independently assessed a set of 349 Year 6 scripts. Seven per cent were graded as Greater Depth by at least one marker, but only one percent were graded as GDS by both markers.
This is not just a problem with the primary writing frameworks, either. There is a large amount of international research literature on the consistency of mark schemes, which shows that no scheme in history has produced precise agreement on open tasks like writing. Hardworking, honest, experienced and well-intentioned markers will not always agree on the mark a piece of writing deserves.
So why do experienced markers disagree in this way? A lot of the time, we think that disagreement is down to more-or-less lenient interpretations of the mark scheme. For example, we can assume that someone has a harsher interpretation of the mark scheme, and someone else a more generous one. We then think that the challenge is to decide which marker is correct, or whether the correct judgement is somewhere in between the two.
Levels of generosity are certainly an important factor. But another important and overlooked factor is inconsistency. Markers disagree with each other, certainly. But very often, they disagree with themselves.
In one study exploring inconsistency, markers were given a certain batch of essays one year, and another batch the following year. What they didn’t know is that two of the essays in the first batch were also included in the second batch.
Did the markers give those two essays the same mark the second time round as they did the first? Eighty per cent of the time they did not.
Aggregation and exemplification
Markers are inconsistent and they make errors. This is not because they are bad or stupid or lazy: it is because they are human! Human judgement in all kinds of different fields is subject to inconsistency. This is nothing to be ashamed of. It’s better to be honest about our limitations than to pretend they don’t exist.
It’s important to acknowledge the inconsistency of markers, because we need to move away from the idea that somewhere out there is a marker or moderator who ‘owns’ Greater Depth. There isn’t. Some people may think their judgements about Greater Depth are always correct, but in order to believe this, they’d have to pass an empirical test: mark 100 portfolios one day, and then come back a month later and mark them all again, and give them all the same mark both times.
If mark schemes and human judgement are so imprecise, what should we do instead? Two important principles can help improve the situation. The first is aggregation. Individual human judgements are subject to error, but if we aggregate together lots and lots of independent judgements, the errors cancel each other out. The second is exemplification. As we’ve seen, the kind of prose statements you get on a mark scheme are quite vague and often provoke disagreement. By using actual examples of writing that is and is not at the Greater Depth standard, we are more likely to get agreement.
At No More Marking, we use a technique called Comparative Judgement to assess writing with much greater consistency than the traditional framework approach. Every single piece of writing is assessed at least 20 times, and the process provides a powerful bank of exemplars at every standard. You can see 349 of our Y6 scripts from 2020 in this interactive notebook.
Daisy Christodoulou is former head of assessment at Ark Schools, and currently director of education at No More Marking, a provider of online comparative judgement. She is the author of Seven Myths about Education, Making Good Progress? The future of Assessment for Learning, and Teachers vs Tech.