The double-edged sword of Test & Punish

Making tests that push our students enough but not too far can feel like one of the biggest challenges for teachers.  Tests that are too easy may lead to a false assessment of proficiency. Those that are too hard can be discouraging to students, and challenging for teachers to grade.

What we want to know through assessing our students is what they are able to produce with the language they’ve learned, but multiple-choice (MC) and fill-in-the-blank (FIB) activities only tell us about students’ memorization and recognition skills, not their productive skills.

In one of our low-level classes, our students took a multiple-choice grammar exam. The weakest student got a 90%, the majority received an A, and there was no bell curve.   When we assessed the same grammar through productive activities, the weakest student received a 46%, only a few got an A, and the average of the bell curve was 78%.   Compared to our own classroom observations, we felt like the productive version was more indicative of each student’s true proficiency level.

Testing can be a double-edged sword.  Discrete FIB and MC activities are easier to grade, but can also come across as punishing.   If students are only asked to recognize correct answers, we are not able to reward them for use.  On a FIB activity, students can lose all points for not memorizing how to spell an irregular past tense verb, even if they exhibit a valid attempt at using it correctly in context.

On the other extreme, if a test or assignment is too open-ended, the grammar can fall apart.   When students are not used to producing language meaningfully, open-ended tasks can become punishing, for the leap is too big.   And as the assessor, we end up spending far too long correcting sentences that are awkward or even uncorrectable.

So the question remains: how can we craft productive exams so they are more successful and less punishing?   And if students are writing more, how can we avoid the extra time it takes to grade them?

Productive exams are less punishing by nature. When a student chooses the wrong answer on a MC test, credit is all or nothing. But when students write sentences, there are opportunities to give partial credit.   They can be two points each – one for the grammar you’re testing, and one point for the rest of the sentence. This is less punitive, for teachers can reward their students if other parts of the sentence show proficiency of previously learned structures.

However, in order to make productive exams more successful for students, one has to look at how far they are being pushed in the classroom, and on the test.   The old adage of “test the way you teach” rings true when looking at literature on transfer theory.   If students are not asked to produce language in class before a productive exam, they can experience negative transfer.

Diane Larsen-Freeman explains: “An example of unsuccessful transfer would be where a students shows certain grammar skills on a standardized multiple-choice language test given at school but does not apply them when communicating” (2013).   In other words, students are unable to produce the grammar, whether in the real world, or on a productive exam. Instead, she notes, students need opportunities to transform their language resources to new environments.   It seems to hold true that if transformation of learning is practiced before the exam, students will be much more likely to succeed on productive assessment.

This means that test preparation and test design are equally important.   When we test, we want to push enough, but not far enough that it causes negative transfer, and leads to hours of grading time.   One way to accomplish this by scaffolding activities using lexical chunks, sentence starters, and builder boxes.     Below are examples of how each can be applied to exams and in-class activities at both the lower and upper levels.

1. Lexical Chunks

Lexical chunks are phrases that combine vocabulary and grammar together. For example, the chunk, “needs a little,” embeds the grammar of the 3rd person –s and the quantifier a little.   Asking students to create a sentence with this phrase is testing whether they know how and when to use a non-count noun, but it also whether students can produce something meaningful with it.

On the left is the full activity taken from a recent grammar exam our low-intermediate students took, where students used lexical chunks embedding count and non-count rules.  The activity on the right illustrates the same lexicogrammatical approach applied to a higher level.  Our advanced students had studied rules about adverb phrases and noun clauses.  They used specific chunks of language that scaffolded both structures within a personalized context.   The students showed that they were able to transfer the use of adverb phrases and noun clauses to a new, personalized context, and most importantly, were able to produce a paragraph that was more grammatically accurate, which meant far less time grading.

Screenshot 2016-05-06 15.51.38

Lexical Chunks – Low Level

Screenshot 2016-05-06 15.54.01

Lexical Chunks – High Level













2. Sentence Frames

Sentence frames are another useful way to test productive grammar. The trick is to figure out how to ensure that students are pushed to generate meaning with the frames.

At lower levels, students can complete sentences using basic time-order conjunctions like when while, and after with the past tense, as shown below. This can help assess students’ use of the past tense and proper punctuation, and gives teachers a chance to see whether students understand time relationships well enough to produce them accurately.

Higher-levels can work on stems that scaffold more advanced structures.   Our students recently took an exam requiring them to complete stems with that-clauses as subjects, which were taught in conjunction with verbs followed by noun clauses, such as explains why, suggests that, and means that.

Screenshot 2016-05-06 15.50.58

Sentence Frames – Low Level

Screenshot 2016-05-10 17.24.42

Sentence Frames – High Level









Below are some of the sentences our students produced, which allowed us to assess meaning because the grammar was more scaffolded.

  • When I became a mom, my life changed.
  • Before we took the test, I looked at my book.
  • That Houston is cheap explains why people move here.
  • That she wouldn’t look at me in the eye suggests that she was upset.

3. Sentence Builder Boxes

Sentence builder boxes scaffold for grammar and allow for choices. They embed grammatical rules, and if well designed, require students to apply them in a meaningful way.   Builder boxes can have any number of rows and columns, but it is generally best to have no more than two rows. The practical advantage of sentence builder boxes is that they reduce grading time. They promote production, but scaffold the grammar enough so that there is less negative transfer.

A low-level builder box might ask students to produce sentences with basic modals, such as can, may, and could.   In the task on the left, low-level students used the builder box to write their own sentences with modal verbs. This helped us to see whether students were able to produce the modals in context, which was a more informative indicator of proficiency than their answers to a multiple-choice exam.

Like lexical chunks and sentence frames, builder boxes can be adapted to higher-level grammatical structures as well.   On the right, students are using a builder box that tests their understanding of active and passive voice. Adding “because” will give their sentences a bit more context so as to avoid incomplete ideas.

Screenshot 2016-05-08 04.32.45

Sentence Builder Box – Low Level

Screenshot 2016-05-08 04.37.34

Sentence Builder Box – High Level








In order for these three productive activities to work on a test, it is vital that students practice with them in class beforehand.   Fortunately, the advantage of using phrases is that vocabulary can be easily substituted.   Chunks and stems can continue to frame the grammar, but slotting in different vocabulary tweaks it enough so that the test will not be a carbon copy of what was done in class.  Likewise, successful transfer can occur when the chunks and stems are put into a new context.

Finding that fine line, where we are pushing our students enough but not too far, can feel like one of the biggest challenges for teachers. The same is true for the way we test our students.   Our experiences have shown us testing can feel less punitive when activities offer students opportunities to practice grammar meaningfully, but in a way that scaffolds the grammar and improves accuracy.   Lexical chunks, sentence stems, and builder boxes are three strategies that provide students with a kind of “controlled flexibility.”   They’ve also allowed us to test meaningful production without a significant increase in our grading time. As a result, testing has felt less punishing to our students, and more enjoyable for us as teachers.


Larsen-Freeman, D. (2013). Transfer of Learning Transformed. Language Learning (63), University of Michigan.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s