Teaching tip: online assessment of a fuzzy topic
Sharon
Curtis
Department of Computing
School of Technology
"Uh-oh... how are we going to assess students now that this module no longer has an exam?" That was the question that Mary Zajicek and I found ourselves asking about a popular second-year undergraduate computing module as semesters approached. The module already contained a highly useful group coursework for the more practical aspects of the material, but that needed to be balanced with a good broad-ranging assessment that offered sufficient defense against cheating. A class test seemed a suitable form of assessment, and the thought of automated computer marking was extremely tempting, but there was a problem...
Automated marking
Automated computer marking requires that the computer be able to clearly categorise the possible answers a student might give to test questions. For example, multiple-choice questions state explicitly the possible categories for a student's answer, and more generally, many technical topics lend themselves to setting suitable questions: either the numerical answer to a maths question is correct, or it isn't; a computer program that a student has written either works or it doesn't. As Carter et al (2003) point out, these sorts of ‘black and white’ marking schemes used in computerised assessment are a serious limitation, and this proved to be a major issue for our module.
The module's topic is "The Human-Computer Interface", which is not precise like most computing topics, but is instead distinctly fuzzy around the edges. Our students had to understand various concepts and learn when, where and how they could be applied, and it was far from easy to see how to set test questions that had crystal clear right or wrong answers.
"Surely someone must have done online assessment for this topic before?", we thought. But search engines turned up no information whatsoever. We tried Blueprint for computer-assisted assessment by Bull & McKenna (2004), which is an incredibly helpful book, offering all sorts of useful advice on topics like question structure, marking schemes and students' reactions, but it didn't solve our problem.
Question structuring
Inspiration finally came from students' past attempts at exam questions: often a lecture-note-perfect description of a concept would be illustrated with an example that illustrated the exact opposite! Testing students' knowledge was not enough: we had to test their ability to apply the concepts.
|
Question 5 A web design company is developing a web site for a health foods shop, and has created a prototype site. One of the company's employees explores the prototype site carefully and systematically, looking specifically at responses to any actions that potential customers might perform. For example, in response to a customer clicking on a button to put an item into a shopping basket, there should then appear a clear indication to the customer that an item has been put into the shopping basket, and which item it is. Any responses that are missing, or inadequate, are noted and reported to the design team. Please indicate whether the statements below are True, False or that you Don't Know.
Scoring for Question 5: |
Figure 1 illustrates the question format we used, consisting of a description of a situation, followed by several small multiple-choice questions pertaining to that situation. Questions could include diagrams, and didn't have to be restricted to True / False, eg an alternative format might be Always / Sometimes / Never. To avoid forcing students to guess when they didn't know an answer, "Don't Know" was always an option, scoring 0. A complete test was made up of several of these large questions, one question for each topic.
Typically, a small multiple-choice question would ask whether a particular concept was occurring in the described situation. Great care was taken with the wording of the questions, using internal moderators to check that if each concept was correctly understood, then the answer to each multiple-choice question was crystal clear.
We felt that this style of question had several advantages:
- Students could show that they understood the concepts
- The reading time per multiple-choice question was kept to a minimum, because of the grouped question structure
- Questions were quicker to answer than traditional exam questions, as students didn't have to spend time wondering how to phrase their answer or writing it down legibly.
As is usual for computerised tests held over multiple sessions in the computer labs, we had to randomise test questions to minimise opportunities for cheating. For each topic, we wrote a blueprint which all questions on that topic had to conform to, ensuring the fairness of test papers, which consisted of one randomly-selected question on each topic. The blueprints also allowed us to demonstrate question equivalence to our external examiners.
However, we wanted to go further to prevent cheating than just randomising test questions: the screens in the pooled rooms are often sufficiently close together that a student could easily read a nearby screen without looking overtly in that direction. In addition, computing students are often very technologically aware, and we didn't want to encourage any creative uses of "screen-sniffing" gadgets.
We decided on a scheme whereby students would read the test questions from printed individualised test papers, and would answer the question online via a web page. This web page, of course, was the same for all students because all the questions had the same structure. For example, figure 2 shows a sample part of the web page corresponding to the question in figure 1.
Question 5
|
Students could use the printed test papers for rough work, and also to record a copy of their answers if they felt this offered more reassurance. Test papers would all be returned to the invigilators at the end of each run of the test.
Implementation
Having decided on how we wanted to devise and run the class tests, and given that we were happy and confident with the new structure we'd arrived at, we were determined not to let the technological tail wag the pedagogical dog, but instead force the technology to do what we wanted it to do.
The current version of WebCT installed at Brookes doesn't allow specifically for this grouped structure of multiple-choice questions, and although we were informed it is theoretically possible to arrange for descriptions to be visible on the screen at the same time as WebCT presents the multiple choice questions, this would have resulted in a terrible interface to present to students. When the very topic of your module concerns interfaces and how not to have terrible interfaces, this would have been a fatal blow to credibility! In any case, there would still remain the problem of students submitting answers online without questions being displayed onscreen, and matching up submitted answers with printed test papers, which WebCT cannot assist with. Therefore we decided not to use WebCT.
Instead (and this is where being a computing lecturer comes in handy), we implemented software to check the bank of questions for inconsistencies, generate and print the test papers, generate the web page for submitting answers, collect the answers, do the marking, and produce statistics of the results ready for pasting into a spreadsheet. In addition, the software could also produce individual student reports, so when we had a pilot run of the class test for all students a few weeks in advance of the real test, we were able to give students individual feedback.
Implementing the software ourselves allowed us much more control, allowing us to do precisely what we wanted, and the additional effort was a good investment for the future. For more details, please see Curtis and Zajicek (2005).
Effects on students
Feedback from students was generally positive: they appreciated having a pilot run, and they generally felt a lot more relaxed than in a traditional exam setting. As Bull and McKenna (2002) had warned, the negative marking scheme was not popular (students understood that it was unfair for guessers to get high percentages, but still didn't like the idea of negative marks). A few students complained that they thought some questions were unclear, but closer inspections revealed that it was really the answers that they thought were unclear.
We were particularly satisfied with the accessibility of the test. The pilot test provided a perfect opportunity to make sure that each student would have the class test provided in a format accessible to that student. It was easy to supply different versions of the web page or the test papers (eg printed on coloured paper) for the few students needing it. The ease with which students could answer questions even meant that one student who usually uses an amanuensis was happy to find no need for one in the test.
Overall, the class test results had a very similar distribution to exam results from previous runs of the module, and we were satisfied that we had achieved our goal of quality assessment for the module.
References
- Bull, J and McKenna, C (2004) ‘Blueprint for computer-assisted assessment’, RoutledgeFalmer.
- Carter, JE, English, J, Ala-Mutka, K, Dick, M, Fone, W, Fuller, UD and Sheard, J (2003) ‘How Shall We Assess This?’, ACM SIGCSE Bulletin vol 35 no 4, pp 107-123.
- Curtis, SA and Zajicek, M (2005) ‘Structuring an on-line assessment of students' learning’, Proceedings IADIS International Conference e-Society, Malta, June 2005
This page maintained by Elizabeth Lovegrove and © Oxford Brookes University