My favourite Christmas gift this year was from my Father-in-Law. Knowing my professional interest in assessment, he bought me the ‘F in Exams’ desk calendar. For those not familiar with the concept, this gives a daily example of an amusing candidate response to a test question. For example, the question that asked candidates, ‘Where was the Declaration of Independence signed?’ Answer: ‘At the bottom’.
I shared these with colleagues and we all had a chuckle. But the more of them I read, the less funny they became. After all, the Declaration of Independence was signed at the bottom. Even if they were being facetious, they weren’t wrong. Should they have received a mark? Most of the answers we were laughing at had been prompted by questions that were ambiguously written and, if real, represented young people who had been failed by the test they took, not just young people who had failed a test.
In my previous role as a test developer, I spent seemingly endless hours poring over exam scripts from trials. As a specialist in developing reading comprehension tests, I have found that candidates often don’t respond in the ways we expect them to. They sometimes give far more imaginative, but no less valid, responses than the writer of the test could have anticipated. Here are some of things I have learned.
- The more eyes the better. What is clear to me may not be clear to you. The more people you can show something to, the more chance you have of universal comprehension. An example might be the question, ‘Why have ellipses been used in this sentence?’ I wanted answers around creating a sense of awe, disbelief and mounting excitement. I got technical answers, such as ‘to create a pause’. I should have asked ‘What is the effect on the reader of the ellipses in this sentence?’ It wasn’t until I read candidate responses that I realised my mistake.
- Knowing what is wrong can be as important as knowing what is right. There will always be spaces between answers which we know are correct, and answers which we know are wrong: this is the ‘zone of uncertainty’. But we should endeavour to reduce this to the smallest possible space. Let’s use an example from a well-known fairy story.
Question: How does Snow White get awakened from her sleep?
Answer: Her True Love’s Kiss.
So are the following answers right, or wrong?
a) She is kissed (no True Love)
b) She gets kissed by the Handsome Prince (is this enough for True Love?)
c) Her True Love wakes her (no kiss)
Are we sure these are incorrect? If we are, then listing some common incorrect answers might help markers not to be tempted by answers that are not quite accurate enough. If we think these are correct, we might adjust the answer given in the mark scheme to ‘A Kiss’ and/or ‘Her True Love / The Prince’.
- Don’t make assumptions about the test taker. I made an assumption just now by assuming familiarity with the fairy story of Snow White, but I shouldn’t have. There’s no reason why everyone in the world should know the story, and even less reason why everyone should know the Disney version. In the Grimm Brothers’ version, the Prince is in the process of having the comatose beauty carried off back to his palace by servants when one stumbles and the resulting movement dislodges the bit of poisoned apple stuck in her throat. What if I only knew that version, and was confronted by the Disney tale as an unseen text in an exam setting? Would I be unnecessarily distracted? And if I am already familiar with the story, will I read less carefully, assuming I already know all the answers? Making assumptions about the range of somebody’s experience can have unintended side effects for the test taker.
I think about the number of times I say, ‘that’s not what I meant’, when I’ve been misunderstood, misinterpreted, taken out of context… but how much of that is the fault of the receiver, and how much of the blame lies with me? Could I have been clearer? Could I have communicated what I meant more effectively than I did? Perhaps I need to practice what I preach in my wider life as well as in my work!
But let’s end on an example of a candidate who has reflected carefully on a question and used it to acknowledge their own shortcomings.
Question: What is a vacuum?
Answer: Something my mum says I should do more often.
A very valid issue pointed with very simple and light examples. Questioning is very effective to establish transparency in teaching and learning process. However, only planned and focused questions can help in formative assessment.
I agree with Kashif J. Questions need be clear and unambiguous. Clarity is certainly paramount if students must have a clear cut comprehension of a set task.
The more we try to give an exam content validity the less likely it will be to have ecological validity.
Let’s take the Declaration of Independence out of the exam hall and situate it in the classroom.
Me: Class-where was the American Declaration of Independence signed? Anyone? No takers. Ok, Johnny.
Johnny: Sir. At the bottom, sir.
Me: Very interesting! Class, is Johnny correct? Does anyone have a different answer?
Mariam: In Philadelphia, sir.
Me: Ok, Now we have two answers. Anyone else? …
The class would smile at Johnny’s answer. Possibly, I would commend him for his divergent thinking skills. Mariam would be commended for her factual knowledge. But, and it’s a very big but- we are only at the bottom of Bloom’s taxonomy.
Consider this scenario instead.
Me: So, what was the objective of today’s lesson… Zac?
Zac: Justification of a revolutionary war, sir.
Me: Right-o! Now, class for our plenary today you will be working in pairs. You can make a time line of the main events leading up to the American Revolutionary War. Or you can make a mind map about the Declaration of Independence.
What would an ecologically valid exam look like? Much the same as the second scenario, I believe. How do we get there? Take an open source artificial intelligence- here’s one https://thestack.com/world/2016/01/15/baidu-releases-open-source-ai-code/
And here’s another
Link the AI to a question bank and a content domain (American history).
Exam boards could score exams on a number of dimensions: factual knowledge, ability to make inferences, divergent thinking perhaps even teamwork!