Assessment under pressure - 14 innovative case studies

14th February 1997

Introduction

What follows is a collection of stories of innovative assessment practice. The stories are based on real case-studies, although the detail may have been changed slightly for ease of explanation. Some of these assessment methods can be introduced at module level, whereas others may require field - or even institution - approval.

This information comes at a time when staff at Brookes are being asked to reduce assessment resources by 10% to 20% over a two-year period. Simultaneously, staff are expected to retain the same standards (as applied to learning outcomes) and, ideally, improve learning through the integration of assessment with the learning process.

Our main concern is that purely mechanistic methods of reducing assessment (e.g. a change from two assignments to one) may result in a loss of standards. It is essential, therefore, that changes in assessment practice do not lose sight of why students are being assessed.

  • motivation
  • creating learning activities
  • for feedback to the student, identifying strengths and weaknesses
  • for feedback to the staff, on how well the message is getting across
  • to judge performance (grade/degree classification)
  • quality assurance - internal to the institution
  • quality assurance - external to the institution

Too often, assessment tries to tackle all of these purposes at once. We believe that the first four in this list require lots of assessment (much of which can be peer-assessment) and the last three require occasional - yet rigorous - assessment. These 14 case-studies are examples of how this can be achieved.

Case Studies

1. Marking time has been reduced to five per cent of what it was (sampling assignments for feedback and marking, marking part of an assignment, peer-assessment, self-assessment, model answers, new course requirement).
2. The number of assignments counting towards the degree could be reduced from 116 to six without changing any student's degree classification (pass-fail marking rather than allocating marks).
3. Self- and Peer-marking of Examinations using Model Answers (peer-marking, self-marking, sampling assignments).
4. An education course is becoming very focussed in assessing competencies (assessing learning outcomes, overlapping assessment to more than one course, peer-assessment, self-assessment).
5. Statement banks to reduce marking-time and improve effectiveness.
6. Assignments submitted electronically can enable lecturers to give economical feedback (Email feedback, sampling assignments for feedback).
7. A science department used peer-assessment for weekly problem-sheets and improved performance in the end-of-year examination results (peer-assessment, new course requirement).
8. Objective Tests.
9. Reducing Time and Improving Effectiveness by using Tape-recorders for Feedback on Dissertation Drafts.
10. 'Would you like a system where we set you up in teams of four and give you the average mark for your team of four?' (assigning average exam mark to learning teams).
11. Regular Self-assessment Improves Peer-assessment.
12. Students assessed on how well they assess other students' essays (multiple-choice, marking scheme).
13. Science students improved their learning by internalising the lecturer's marking system and peer-assessing lab reports (peer-assessment, marking-schemes, sampling for feedback).
14. Research shows that students can judge oral-presentation skills as reliably as lecturers (peer-assessment).

1. Marking time has been reduced to five per cent of what it was

(Sampling assignments for feedback and marking, marking part of an assignment, peer-assessment, self-assessment, model answers, new course requirement).

A science department had a slow turnaround on the marking of regular practical reports, and a sense that this work was poorly focused. Students didn't understand why they were putting so much time in. It was unclear what the marking was for.

Staff reviewed the objectives of workshops on their lab-based courses and decided they were mainly interested in four things - analysing and interpreting data, written communication skills, experimental design and understanding key concepts. Students were told: 'We want a portfolio from you at the end of the year that has 20 reports for the 25 practicals, and if you don't submit 20 then you fail the course and you aren't allowed to sit the exam.'

Students were also told that four reports would be pulled out at random. One would be assessed for communication skills, one for data-handling, one for experimental design and one for understanding engineering concepts. Students didn't know which would be assessed for what, so they had to pay attention to all four things every time. There was only one bit of marking and it was very focussed marking so it was extremely quick. It wasn't only that it was four reports out of 20, it was that it was only about 20 per cent of the four out of 20, so the marking time went down to about five per cent of what it was previously. (Obviously it matters where you sample from if students were improving during the course.)

The reports were date-stamped in the office and then put into the portfolio, so students couldn't put anything in the portfolio that hadn't been submitted on time. And they couldn't change them afterwards. When the deadline was passed, the lecturer would give feedback on that lab, using model reports or peer-marking exercises or self-marking exercises or having sampled some of the submitted reports. They used a variety of cheap methods for students to engage with the quality of their work and others' work and think about standards. But they didn't use lecturer marking.

2. The number of assignments counting towards the degree could be reduced from 116 to six without changing any student's degree classification (pass-fail marking rather than allocating marks).

A study of a statistics degree course collected information on all 116 pieces of assessed work. The researchers constructed a huge matrix of marks for each student. Using statistical analysis they then identified the individual assessment which differentiated least between students. This particular column of marks was then removed from the matrix. Student' degree classification was then recalculated from the remaining 115 marks. There was no change in any student's degree classification.

This process of removing the least differential set of marks was continually repeated. The students' degree classifications remained unaltered until only six pieces of assessment remained. It might therefore be argued that if all (or only some) of the other assessments had simply been pass/fail - or course requirements, formatively assessed by peers - staff could have saved time with no change on eventual degree outcome.

3. Self- and Peer-marking of Examinations using Model Answers (peer-marking, self-marking, sampling assignments).

Over 100 students in a third-year science class assessed their own performance and a peer's performance in a mid-term examination.

At the first class after the examination, students were given model answers, commentaries and a marking-schedule by the lecturer. Each student was allocated an anonymous examination paper. Using the model answers and marking-schedule, the students marked two papers - their own and the anonymous one - in their own time.

They were required to fill in the space on the marking-sheet, saying in detail where the student had departed from the model answers and awarding a score for each section (on a scale provided). Students returned the papers and marking-sheets the following week and received their own examination script.

They then applied the same procedure to their own paper without knowing what marks their peer gave the paper. The self- and peer-generated marks were then compared. If the range was less than 10% the student was awarded the self-assessment mark. Otherwise the paper was re-marked by a member of staff. In order to discourage students colluding with each other to fix marks, other papers were sampled at random and marked by staff. Students liked the system, and staff reported time-savings, even allowing for extra time spent on preparing model answers and organising the movement of papers.

4. An education course is becoming very focussed in assessing competencies (assessing learning outcomes, overlapping assessment to more than one course, peer-assessment, self-assessment).

An education course has recently changed from a very expensive assessment system to one of the leanest systems seen anywhere. In the old system, students produced volumes of paper which lecturers ploughed through, but there was almost no assurance of standards. They couldn't tell whether the students were competent or not. You could tell that students were busy but not that they were competent. Tasks were too big and complex, and feedback came too late.

They didn't have any exams - it was all coursework. They have now agreed to have an exam - except that they call exams 'time-constrained tasks'. The exam tasks simulate - or in some cases are identical to - coursework tasks. For example, in the exam they will be given class records of pupils' performance in English in the national curriculum across a year. The data consists of one side of A4 plus a page of comments about pupils, and there will be a questions like 'Comment on the adequacy of the record-keeping system of the teacher' and 'Comment on the adequacy of the teacher's plan to cope with individual differences within the class with reference to the national curriculum, what you know about the teaching of English, what you know about the record-system.'

Hence, the exam pulls in things that are on the competency list. Students would know to expect a question about record-keeping where they would have to look at some real records. They'd know there would be a question about lesson planning and that they would be given some actual lesson plans. The only way of preparing was to look at record-keeping systems and make sense of them, and look at lesson plans and make sense of them. The ground rule is that students cannot be confronted in the exam with a task that they haven't tackled in a formative way during the year.

During the year teachers are encouraged to use model answers, peer feedback, self-assessment and class discussion. Students are set up in learning teams to help each other prepare for the coursework tasks and give feedback. Either side of each assignment deadline, these learning teams meet with tutors. At other times they meet without tutors.

Actual marking is confined to four exams ('time-constrained tasks') to assess the entire year. They have agreement between the different course leaders about the range of tasks, so there aren't two lecturers covering, for instance, lesson planning. There is a grid of competencies that students are supposed to address, and they can see that they are being covered by the exam. Exam answers are quite short, so the external examiner or whoever can see on a couple of sides whether students can do these things or not.

5. Statement banks to reduce marking-time and improve effectiveness.

It is conventional for lecturers to write comments in the margins of assignments. Speed of marking and restricted space often means that these comments are clipped and ungrammatical. Need more explanation! Such comments can seem curt and overly critical from the student's perspective.

A number of courses have now converted to statement-banks. The comments available in a statement-bank can be more supportive and detailed than lecturer comments. For instance, one such bank lists 34 comments for a lecturer to choose from, including quick comments ('This is great!!! Do more of this') and detailed comments - 'This introduction/conclusion/ section/phrase feels pasted on and disconnected from the rest of the essay. See if expanding on the ideas in the section before or after works better. (Also, ask if this section really relates to the essay or if it is a personal comment about the idea you have just presented.) If this is unclear, ask me about it.'

Examples of overview comments include: 'I love your writing style and diction (word choice);' 'I think the tone of this essay makes it less effective;' or 'The language of this section/essay is not appropriate for the audience/register/subject (too informal or too formal).'

Statement-banks can be introduced at any point along a continuum from low-tech to hi-tech. For instance, at the lower level, students may be given a detailed list of numbered statements so the lecturer can write numbers in the margin. This can be a lot quicker and more positive to students than the traditional way of providing comments. At the hi-tech level, assignments can be loaded on to a PC, lecturers can punch in numbers and the computer can replace numbers with the appropriate comments.

6. Assignments submitted electronically can enable lecturers to give economical feedback (Email feedback, sampling assignments for feedback).

Students can submit assignments electronically and staff can give feedback electronically. Assignments are submitted and logged. The lecturer reads them, or reads a sample of them, and types in comments. But everybody gets all the comments because it is done electronically.

Any kind of system can operate. There is an advantage in students seeing a range of assignments, good and bad, and lecturers can introduce a sampling process to save time. Say that 23 students did this question. The lecturer can say, 'I'm going to read five of them and give a range of comments to all twenty-three.' Providing the course requirement makes them all do it, and there's feedback, then it works.

7. A science department used peer-assessment for weekly problem-sheets and improved performance in the end-of-year examination results (peer-assessment, new course requirement).

The old system involved weekly problem sheets which lecturers marked and then discussed in problem classes. The problem classes grew bigger (about 25 per class) and marking became too time-consuming. Lecturers stopped marking problem-sheets, and exam marks went down (to about 45% on average). They couldn't afford to put the weekly marking back in so they borrowed a system developed in Australia by Dave Boud.

Students were told they had to complete 50 problem sheets in the year or they couldn't sit the exam. There was no quality control on these problem-sheets. They could have had pictures of Mickey Mouse all over them.

On six occasions during the year, problem-sheets were collected and logged by an administrator.As the 170 students entered a large lecture theatre, they put their problem-sheets on the table for the administrator to log. After that date they couldn't be logged - an incentive to keep up-to-date.

The problem-sheets were randomly redistributed. Students would sit down with a pile of problem-sheets and a marking-scheme similar to that given to postgraduate teaching assistants, and students marked the sheets. They didn't mark them like lecturers do - 'tick, cross, tick, cross, six out of ten'. Instead they wrote, 'Brian, you moron, why did you do it like this?' because Brian's name was at the top of the sheet. You knew who you were marking but you didn't know who marked you. At the end of the session they were handed back to the people who submitted them.

None of these marks counted. It was purely formative. And yet the average end-of-year exam mark went up to 85% (a higher mark than when lecturers marked problem-sheets). Lectures were the same, problems were the same, and the examination was the same. Peer-assessment in this case was not only a cheap marking exercise - it was also a learning exercise. The act of peer marking helps people realise what they do wrong in problems and helps them see alternatives. It provides peer pressure to perform well because it is embarrassing for somebody else to see your poor work. And there is instant feedback - very powerful social feedback. Quick, inaccurate feedback from students has far more impact than slow and accurate feedback from lecturers.

The point is not that students learned to mark as well as lecturers. The students' marking was almost certainly unreliable but it worked. It worked because the system maintained the volume of student activity, and maintained the feedback. Less lecturer assessment can damage learning if it reduces student activity and reduces feedback.

8.Objective Tests.

In order to reduce the amount of time spent on assessment, one institution introduced a policy of switching first-year assessment to computer-based examinations using Question Mark software. A small support team helps lecturers write suitable questions and multiple-choice answers. The institution accepts that staff cannot assess everything they want to, but more sophisticated assessment can take place in stage-two.

There are at least three levels on which objective tests can be introduced:a simple system where students tick options and staff mark answers; a large number of multiple-choice questions with answers that can be read by an optical mark reader; or more sophisticated technology such as Question Mark, which can incorporate diagrams or photographs (eg. 'Identify the cranium by clicking on the appropriate part of the body.'), or can require students to indicate the correct word, phrase or number.

Objective tests can be used for either summative or formative assessment, the latter by tinkering with course requirements. An added option is for students to choose when they are ready to take the tests.

9. Reducing Time and Improving Effectiveness by using Tape-recorders for Feedback on Dissertation Drafts.

A department was concerned about staff supervising more and more dissertations. Traditionally, students handed in draft chapters and received written comments from their supervisor. Each student then met with their supervisor to discuss the comments; generally the students did not see the comments prior to the meeting. Students would frequently visit the supervisor at later periods for further discussion.

The new method requires students to submit draft chapters and a blank cassette tape. Supervisors speak their comments on to the tape, writing numbers on the text to help students locate the origin of specific comments. The one ground rule is that supervisors refuse to see students for tutorials unless students have listened to the tape-recorded feedback.

This method has several advantages. Taped comments are much quicker to record than written ones, and often convey far more expression. Also, students often have trouble reading a supervisor's handwriting. Discussion time with students is used to maximum effect as they are better prepared. Students have reacted positively to the change. They say that they receive better, fuller and more personal feedback than from written comments. Some say it helps them to organise their time and a few have tape-recorded the tutorials so that they do not waste time writing piles of suggestions. Most students have easy access to a tape-recorder.

10.'Would you like a system where we set you up in teams of four and give you the average mark for your team of four?' (assigning average exam mark to learning teams).

An accountancy course had been stripped down to lectures and an exam, and performance was terrible. Very high failure rates. Almost nobody got marks over seventy, while about 40% of students got below forty. The majority of students got between 30 and 50, which was very, very low.

The person responsible showed students some data from somewhere else and said, 'Would you like a system where we set you up in teams of four and give you the average mark for your team of four, because I think the same thing will happen as happened in this other context, ie exam results will go through the roof?' And students voted in favour of it.

They introduced it and it worked.They set up students in learning teams of four and told them: 'You will study in teams of four, you will sit the exam and tests as individuals, but you will get the average mark of your team of four.' The exam board (or the local quality assurance) stipulated two conditions:(i) that the students agreed to it and (ii) that no individual was allowed to pass the course if their individual mark was a fail mark, ie no-one could pass the course on the average of the others.

Students taught each other furiously to make sure the average was high. The average mark went up to something like 56 and about 25% of students got over seventy. Students were still allowed to resit, and the resit mark was allowed to count towards the group average, so the group tutored students through the resits, and for the first time in living memory nobody failed the course. If somebody failed the exam the rest of the learning team said, 'You bastard, you're going to pass the bloody resit.'

Schemes like this work best when tutors use their time with groups by discussing process - How do you learn effectively in teams of four? - rather than content. What made the difference was changing the process through the social dynamics. You could use economical assessment methods but it didn't support learning unless the process was in place. The biggest increase in marks came for better students. This demonstrated that the act of teaching others had the most impact on performance. Many of these processes bring the bottom up but this one spread the marks. Nothing changed except the social processes associated with the course. You can afford cheap economic assessment methods provided the processes are right. When lecturers shift to economic resource-based methods with low class contact and cheap assessment systems, success is often determined by the social dynamics - the way people collaborate and talk to each other out of class, and things like social pressure and peer pressure.

11.Regular Self-assessment Improves Peer-assessment.

In one department, students use self-assessment sheets (of various kinds) for virtually everything they do. Everything is self-assessed in addition to being tutor-assessed, so that when you submit your work there's a self-review of some kind. Some of these are highly structured and some are very open-ended. Students gradually calibrate their judgments closer and closer to the lecturers.

Self-assessment is a principle of its own, but the most important thing is that if students have internalised the judgments they can supervise themselves, so you get a higher quality of work. They don't need lecturers to judge their work - they do it for themselves. It's a self-improving process, and giving students the tools to do it is the most important part of the process.

In the second year the department introduces two-stage assignments. The first stage is for a student to swap their assignment draft with another student and comment on each other's work. When they submit the completed assignment students add a sheet with comments such as 'This is the feedback I was given and I think Brian is absolutely right about the statistical analysis and I have redone it using a Kruskal-Wallis, but I think his analysis of the introduction was completely wrong and I've left it as it is.'

' Students indicated what they had responded to - and what they hadn't - and how they had improved it. That was part of the work. They were being judged partly on their understanding of what quality consisted of. And of course the quality of work goes up because they have the opportunity to improve it. That isn't marking. The lecturer still marks the final submission, so it doesn't reduce the assessment load unless the lecturer had been involved in giving feedback. But it does improve the learning when students give each other feedback.

12. Students assessed on how well they assess other students' essays (multiple-choice, marking scheme).

On a Humanities course the coursework assessment was an essay but as student numbers rose this became increasingly time-consuming to mark. So while still requiring the students to write an essay, instead of marking it students were allowed to bring their essay and the marking criteria (which had been given at the same time as the essay was set) to an exam. In the exam they were given five essays of differing quality, and asked to rank them using the marking criteria and their own essay as a guide. They were then marked on how close their ranking came to what staff considered to be the correct order.

13. Science students improved their learning by internalising the lecturer's marking system and peer-assessing lab reports (peer-assessment, marking-schemes, sampling for feedback).

Student numbers went up and it was taking about 20 hours a week to mark the regular lab reports. Also, marks stayed about the same through the year. So they changed the assessment methods.

In week one they gave students two of the previous year's lab reports to mark - one slightly above average, one slightly below average. The students marked them blind. Then the lecturer showed them how he had marked them. He gave them his marking scheme and said, 'Mark them again.' The students re-marked them, internalising the marking scheme.

From then on, at the beginning of every lab, each student marked somebody else's lab report and gave it back immediately. Meanwhile, the lecturer sampled students' work and gave immediate oral feedback - 'I've taken a quick look at ten and these are the things you are doing well and badly'. He also double-marked some to monitor standards. The marks were much better than they had been previously.

Student performance had improved and yet marks no longer counted. High quality arose from students internalising grades and providing feedback. It was not that students were allocating marks for use, which is what self- and peer-assessment often tries to do, it was that the act of peer marking improved learning and performance. Lab reports weren't needed to generate marks. In fact the lecturer had an exam at the end of the year.

14. Research shows that students can judge oral-presentation skills as reliably as lecturers (peer-assessment).

Nancy Falchikov, at Napier, has shown that if you are judging oral communication skills, and have a proper framework which students understand, then students are as good as lecturers. The only justification for involving lecturers is as a modelling exercise or as a check.

To use student judgment, however, you have to take the literature seriously about taking steps to prepare the students. This includes the way you describe the criteria, the way you lay out the forms and the practice that students get in learning how to do it properly. And the lecturer must monitor to pick up where things go wrong. Most people - lecturers and students - are inconsistent about judgments and have great difficulty using the criteria. And they don't have standards in relation to the criteria, so what counts as good varies enormously. When properly prepared, students can judge oral communication skills as well as lecturers.

The following assessment methods are used in the case studies numbered:

        1, 2, 7   Assignments as course requirements (not summatively assessed)
            4   Assessing learning outcomes or competencies
            10   Assigning average exam marks to learning teams
            6   Email feedback
          12 13   Marking-schemes
          1, 3   Model answers
            12   Multi-choice examinations
            8   Optical Mark Readers & Computer-based assessment
            4   Overlapping assessment to two or more courses
            2   Pass-fail marking rather than allocating marks
1 2, 4, 7, 11, 13, 14   Peer-assessment (formative) instead of staff feedback
            3   Peer marking/grading
      1, 3, 6, 13   Sampling assignments for giving feedback
            1   Sampling to assess part of an examination/assignment
        1, 4, 11   Self-assessment (formative) instead of staff feedback
            3   Self- marking/grading
            5   Statement bank of feedback comments
            1   Summative marking of only part of an assignment
            9   Tape-recorded feedback

Page Top