Setting Motions

Stephen M. Llano

The current movement among CA teams is to set motions that are not only interesting to debate but that attempt to educate as well. The general acceptance of videos, context slides, and information slides at nearly every tournament has liberated CA teams in their motion writing. Now that it has become a norm to provide additional information to debaters, the process of motion setting has become more imaginative, more creative, and clearly broader in scope. What was once seen as normal – looking at the daily news to set the afternoon’s debate motions – now is considered lazy practice. CAs regularly set motions that focus on larger political theory and philosophy, and debaters are expected to use current events to fill in the gaps.

This is to be celebrated. It wasn’t that long ago that CAs considered it appropriate to set motions based on fortune cookies. But as competition at Worlds has become increasingly competitive, the motions have followed suit. Once one could be witty, clever, and familiar with the past two days of headlines and win a lot of debates. Today one has to be much more familiar with larger trends in global affairs and the theories behind them to be successful in BP debate.

This change has some dangers. The most crucial is the risk that in our excitement to set deep, novel, and complex motions for debate we forget that debate in all aspects should be accessible to the reasonable audience. This link is what keeps BP relevant, valuable, and competitively fair. Debating should always maintain a familial relationship with public sphere discourse in some way in order to remain recognizable. Consider martial arts – a highly technical practice that appears mysterious from the outside. But placed within a real-world context, martial arts is more than just making the moves for the approval for the master. It can serve as exercise, improving the health of the person, or it can serve as self-defense in dire situations. There are martial arts competitions held more frequently than debating competitions I would bet. And each of them preserve this balance between a fair and engaging competition that rewards the making of good moves while maintaining connection to relevance to the outside world.

I suggest a check on motion crafting that extends from the judging standard in British Parliamentary debating – the reasonable person. Although the reasonable person standard has been discussed frequently within this journal and other sources, it has primarily been considered a theory of judging. 1 I believe that the reasonable person standard should not be just for judging, but for judging motion quality as well. This extends the reasonable person standard to the ability of debaters to create arguments. I argue that the concept of the Universal Audience, created by Chaim Perelman and Lucie Olbrects-Tyteca in their work The New Rhetoric is the mechanism by which CA teams can check to ensure motions are set within the scope of the audience of debaters at the competition, avoiding the risk of setting a motion that although deep and interesting, might be inaccessible to those speaking simply because it isn’t accessible to reasonable people.

Chaim Perelman and Lucie Olbrects-Tyteca write about argument inductively, finding the places and the means from which people generate argumentation in their daily life. Their theory is meant to help ground, expand, and improve what we might call “debate” – debates that happen in daily life as a matter of course. Within competitions we attempt to mimic this practice and create an art out of it suitable for competitive judgment. This art is often viewed as only a competition, meant to identify who is really good at it. At the same time, this competition is engaged in teaching a rhetorical relationship to the world, toward argumentation, discussion, disagreement, and toward how to engage other people about their ideas. This activity – which I call “debating” – is usually a mix of both of these ideals. Sometimes, “debating” is used to critique “debates” – what counts as good argument in the world is not viewed as such by debaters. What contemporary motion writing gets right is the idea that we should broaden our comfort zone about what we choose to debate about in order to ensure we are attentive to the entire world of potential controversy. What they get wrong is to sever this connection to public discourse nearly entirely, replacing it with their own form of the civic voice, or what seems “cool to debate.” This results in two forms of motion that debating should do without.

Where Do Motions Come From?

When we consider the needs of a debating competition, motion setting is always at the top of the list. This is the opposite of reality where the decision of what to debate is what motivates the acquisition of a space, a time to meet, and an order of speakers as well as time limits or whatever other restrictions are necessary. In the world of debating, these concerns are dealt with first, and the CA team begins the motion conversation after they have been asked to serve.

I describe this process as anti-mimetic, meaning that it follows a pattern opposite its “natural” counterpart. Debating and debate have little in common beyond name, and what they do have in common could be described with the same ancient Greek word Aristotle used to describe the relationship between dialectic and rhetoric – antistrophos, or in the words of Jeffrey Walker, its “distant sister.” He explains that the best way to view it is that, “the relation is one of systematic difference as well as similarity.”2. Debate as a natural, public sphere phenomenon is related to but twisted away from debating, which is crisis and disagreement imposed from outside onto a group of people who have arrived precisely because they all agree that vehemently disagreeing on a few different topics for the weekend would be a great thing to take part in. They are sisters like Anna and Elsa from the film Frozen. The familial relationship is always present, but very distant in the way the two women engage the world.

The generation of motions leans toward the Elsa side – the creation of a world of controversy out of what is immediately present. Anna, in contrast, engages the world with what she finds in order to construct her engagement. Debating tournaments are like ice castles in the sense that they spring out of nothing, are really fantastic, and are unsustainable – their amazingness is possible due to their fragility. The competitors have not assembled to solve anything. They are not like debate attendees who are looking for a way to overcome an impasse. They are looking for rather exciting impasses to become involved in arguing about. The news is just one source, and not the best one, for the generation of debatable topics that would fit the situation. CA teams are under a lot of pressure to meet this need, and a good solution might be to put distance between the motions and the “real world.” An Anna-style of motion setting would be to use what’s available to solve the problem. Both have mixed results, and the film of course proves to us by the end that we are best with both approaches – a little magic and a little pragmatism.

Part of the problem with reaching this blend is well described by Chaim Perelman and Lucie Olbrects-Tyteca in their discussion of the function of elite audiences. Perelman and Olbrects-Tyteca define the elite audience as an audience that believes the way it behaves should be a normative prescription upon all audiences. They confuse their way of thinking and believing as the norm toward which all audiences aspire. “The elite audience is regarded as a model to which men should conform in order to be worthy of the name: in other words, the elite audience sets the norm for everybody. In this case, the elite is the vanguard all will follow and conform to. Its opinion is the only one that matters, for, in final analysis, it is the determining one.”3 Setting up a debate for the elite quickly becomes setting up a norm by which audience quality is judged.

CA teams can easily fall into believing that a motion is good because it is something the debaters “should know about.” Often this claim is ungrounded – rarely do CA teams point to a collection of literature that would be accessible and within the purview of those debating. Something that someone is writing or reading about for an advanced degree is often used as a motion with the defense that this controversy is current in the field, forgetting that most debaters do not have the ability to become familiar with that field. A paragraph on an information slide is insufficient to make debaters familiar with the controversy. Instead, speakers turn toward it as the grounding for proof instead of the grounding of the root of the controversy. The difference is between a good debate and one that the adjudicators wish they didn’t have to decide.

An example of this sort of motion was set at the Vienna IV two years ago. Before the debate, a rather long YouTube video was played that detailed how the U.K. bombed German cities after hostilities had ended in World War II. The motion was This House Believes That school children in the UK should be taught that their country engaged in war crimes. Although this is the start of a very stimulating discussion and debate, or possibly larger research project, it lacks important contextual elements that a debate should have – namely, it needs agreement on the controversy. Facts about the historical incident are not enough – to debate the motion at a depth that would be satisfactory one needs further insight. Why is this issue controversial? Who are the people involved in the discussion? The CA team believed that since people should know about this issue, it made for a good debate. What was missing was the in-depth reading, or access to debate arguments made in the world, that would indicate a number of starting points that inductively stem from the controversy. Instead, the information video and text is used as fact that becomes support for a deductive argument about rights, state obligation, or the value and scope of education that is only tangentially related to the issue.

Another example of this sort of motion was set at Yale involving the practice of “bug chasing” where people participate in orgies with HIV positive individuals. Although the issue is worthy of reading about, controversial, and very novel, the lack of access to much of the larger controversy around it harms the debaters’ ability to create arguments oriented toward a reasonable person. The surprise and shock of learning about such a practice would overwhelm the reasonable audience at first, as it would the debaters. Without access to the arguments that the practitioners might make in media to defend their choice to be “bug chasers,” the debate will suffer from this lack of perspective. Again, debaters will be required to access arguments familiar to the context of the community of debaters not the groups involved in the controversy. The starting points for argument construction should be accessible.

Another concern with vanguard motion setting is the concern that because the CA team likes the motion and finds it really interesting, it passes the test for being a good motion for the competition. These motions are identifiable due to the lack of grounding in anything other than the opportunity for debaters to employ highly technical moves to access the tropes familiar to all those who debate. The motion, This House would randomly assign official first names at birth, suffers from a lack of a public sphere discussion entirely. The reasonable person, imagined as a member of the universal audience, would not recognize this topic as debatable, but more ridiculous. It would be hard for the reasonable person to see this as possibly controversial. The lack of conversation in the public sphere through accessible media make this topic hard to see as appropriately controversial, although it is clearly something that would be controversial if suggested.

These two main ideas – that reasonable audiences are the target of debaters’ speeches, and that motions should be fair and accessible is not a new idea, in fact, it is the norm that we aspire to in designing our competitions. What should be clear from these two examples is that a better system of checking the quality of a motion is needed. Debatability and controversy are not enough if they are not provided within a larger context of accessibility to the debaters.

Grounding Motion Setting in the Universal Audience

The concern I have for the rift developing between debating’s connection to debate is rooted in a concern that our rhetoric is becoming overspecialized. “Argumentation aimed exclusively at a particular audience has the drawback that the speaker, by the very fact of adapting to the views of his listeners, might rely on arguments that are foreign or even directly opposed to what is acceptable to persons other than those he is directly addressing.”4 People usually overcome this concern by attempting to offer arguments that they feel any reasonable person would find persuasive. Sometimes this takes the form of addressing a timeless audience of listeners, but we should realize that this audience is an imagined one, crafted from knowledge we have via experience about how people act and react to particular persuasive claims. Chaim Perelman and Lucie Olbrects-Tyteca identify the operation of this concern rhetorically as the Universal Audience. This is not an ontological universality – on the contrary, the Universal Audience is constructed based on concerns of context and culture. “Everyone constitutes the universal audience from what he knows of his fellow men, in such a way as to transcend the few oppositions he is aware of.”5 The universal audience is made from the material concerns that come about from connection to society, culture, and institutions. One imagines the objections that situated people would make to one’s argument, and attempts to account for them.

The Universal Audience is the check that the rhetor uses to ensure that they are not overspecializing their argumentation. Adaptation to the audience is a good thing up until the point where the arguments work to exclude particular groups of people who the speaker may want to persuade, or more likely, groups that the speaker would like to identify with in order to make her argumentation more compelling to the immediate audience. This is the case in debating where the speaker attempts to link her argumentation at all times to the thinking of the reasonable person. “There can only be adherence to this idea of excluding individuals from the human community if the number and intellectual value of those banned are not so high as to make such a procedure ridiculous.”5 That is, one cannot dismiss a large segment of the debaters as being ignorant because they could not debate a particular motion properly. An argument that is unconvincing might not be so because the majority of the audience is incapable of thinking. It is more reasonable to assume that the argument does not resonate with their experiences and thoughts. The same goes with motions – sometimes motions fail to produce good debates because they are not properly adapted for those who would debate them.

The use of the universal audience in motion setting would be for the CA team to think about the reasonable person standard away from judges and within the context of argument creation. The central method of using the universal audience as a guideline is to make sure that there is enough context accessible to debaters to ensure that they can construct arguments for a reasonable person.

Reasonable Motion Setting: A Method

This process consists of three parts. First, any motion must be grounded in public deliberation. This means that there must be a test to see if reasonable, interested people could get access to a variety of sources of public debate on the topic. This is vital to access the rhetoric surrounding the controversy, which helps debaters ground their arguments within the realm of the reasonable person standard. This access should not be purely academic – the majority of reasonable people in the world do not have access to scholarly sources. Care must be ensured that there is not a lean toward such sources, considering most contemporary CAs hold advanced degrees or are studying for them. This test is most similar to the “Five Arguments” test that many CA teams employ to determine if side bias is present in a motion. This additional test of access is the same, but grounds the test outside of the competition, connecting it to the presence of such lines of argument in the public sphere.

Secondly, the team should ask if the discourse is recent enough to warrant setting the motion. CAs should check to see if the controversy is bubbling up in one form or another in ways that the reasonable person would notice. A motion could have a lot of things written about it, but if they are not circulating in current media, the reasonable person might not have an opportunity to access that controversy. There is solid and healthy conversation to be had by the CA team on this issue, as recency can have many meanings. Some topics, although not directly under robust discussion by public intellectuals or other media sources, are still things that can be assumed to be present, as they form the background of myriad arguments within states today.

One final check is related to pandering to the audience. Certainly, one should not set motions because one feels they are simple enough for debaters any more than they should set motions as a normative judgment on the quality of the debaters. There is no shortage on controversial, important, and vital issues for us to learn about and discuss. Motions should contain this spirit of the “push” toward broadening one’s familiarity with the world, no question. But using this check of the Universal Audience, one might construct them as the opposite of the elite audience. This could lead to the setting of some motions that are pedantic. How can this be avoided?

Perelman and Olbrects-Tyteca realized this might happen with their theory, since the universal audience is an imaginary judge over one’s argumentation. To check against making the mistake of low-balling the average, reasonable person, one uses the undefined universal audience as a check. It is “invoked to pass judgment on what is the concept of the universal audience appropriate to such a concrete audience, to examine, simultaneously, the manner in which it was composed, which are the individuals who comprise it, according to the adopted criterion, and whether this criterion is legitimate.”7 Said another way, there are moments when a concern for accessibility might trump the presence of the actual audience, rendering them irrelevant – the arguments would appeal to a universal audience that might trump actual audience concerns or abilities. This is the moment where the CAs do a reality check, and make sure they are not overreaching in the direction of these concerns, and whether or not the debaters present can debate the motion at a quality level that preserves connection to the world while also delivering an engaging and fair competitive moment.

Let’s test the motion, This House believes that the countries of the world should create and participate in a global carbon cap and trade system. The first thing the CA proposing this motion should do is some research – not about cap and trade and the arguments for or against it, but research to see where this issue is coming up in the debate world – media, public intellectuals, or other sources. This motion, like many, is unclear on this question. CAs can defend it being present due to the increasing public discourse on global climate change shifting from a stasis of conjecture to one of quality – “it’s happening, so what should the response be?” This would be something the CA team should discuss to see if the public deliberation is suggesting this as a part of the controversy.

The recency question is also one that would need significant discussion, but if the CAs see the motion as a part of the larger discussion on global climate change, the answer is clear that this motion should be set. Passing this part of the consideration is often subjective, but checked by the CAs reminding one another that the reasonable person is also debating as well as judging – would the reasonable person find this issue controversial in a temporal sense?

Finally, the question of the undefined universal audience and that of pandering. In this case, this motion suggests a concern for meeting debaters exactly where they are. It is a debate about climate change, but also pushes them to investigate cap and trade – something that is not appearing in the surface news sources that debaters might frequent – or it rewards those who have delved a bit deeper into the debate and not into the techniques of debating.   A CA team concerned about the presence of cap and trade in the motion might choose to reword it to be about climate change – a clear trumping of the universal audience with the one that is present, and a move that could be considered pandering – keeping out the more complex argumentative possibilities over the fear that the debaters “won’t get it.”


            Motion setting is the unenviable task of satisfying both one’s ethical relationship to debating along with the obligation to provide the raw materials for an excellent competition. CA teams have taken on a mantle of prescribing not only motions that are good to debate about, but many motions that imply what issues debaters should be familiar with. Unfortunately, this normative push in motion setting turns debating inward, using itself as the metric of whether a motion is good for debating or not. This further isolates the competitive act of debating from real-world argumentation situations that I term debate. The debate/debating link should be preserved not only to tie value to debating, but to increase the quality of competition as well .The Perelman and Olbrects-Tyteca notion of the universal audience is the check that, if used by CA teams in motion setting, can bring more balance and less shallow debating based on information slides. The universal audience checks the motion to ensure that the reasonable person would consider this motion to be worth debating by asking if it is circulating in the collective discussion recently. It also checks CA teams from low-balling their audience at a tournament, and gives warrants to the normative push for inclusion of more complex or specialized terms in motions. Debating’s value, as in martial arts, is in the application of complex moves both in the tournament and in the world. Without attention to preserving that connection, debating will become an irrelevant society of inward turned thinkers, performing what they think the vanguard will want to hear, ignoring the vast array of controversies present in the world at any given time.

  1. As a starting point, see Bibby, Block, and Llano, Eds. Adjudication: Essays on the Philosophy, Practice, and Pedagogy of Judging British Parliamentary Debate (New York: IDEA Press, 2013).
  2. Jeffrey Walker, Rhetoric and Poetics in Antiquity (Oxford: Oxford University Press, 2000), 171
  3. Chaim Perelman and Lucie Olbrects-Tyteca. The New Rhetoric: A Treatise on Argumentation trans. John Wilkinson and Purcell Weaver (London and Notre Dame: University of Notre Dame Press, 1969), 34.
  4. Chaim Perelman and Lucie Olbrects-Tyteca. The New Rhetoric: A Treatise on Argumentation trans. John Wilkinson and Purcell Weaver (London and Notre Dame: University of Notre Dame Press, 1969), 31.
  5. Chaim Perelman and Lucie Olbrects-Tyteca. The New Rhetoric: A Treatise on Argumentation trans. John Wilkinson and Purcell Weaver (London and Notre Dame: University of Notre Dame Press, 1969), 33.
  6. Chaim Perelman and Lucie Olbrects-Tyteca. The New Rhetoric: A Treatise on Argumentation trans. John Wilkinson and Purcell Weaver (London and Notre Dame: University of Notre Dame Press, 1969), 33.
  7. Chaim Perelman and Lucie Olbrects-Tyteca. The New Rhetoric: A Treatise on Argumentation trans. John Wilkinson and Purcell Weaver (London and Notre Dame: University of Notre Dame Press, 1969), 35.

Comparing Experienced Judges and Lay Judges

Eric Barnes


Though the wording of this platitude varies slightly when repeated at various judge briefings, it is commonly accepted that the goal of judges in British Parliamentary debate is to emulate the typical, educated, intelligent person. The primary question we are looking at in this study is whether actual BP judges are really doing this. We examine this by comparing the decisions made by normal judging panels at a tournament with decisions made by a panel of educated and intelligent people who have no familiarity with competitive debating. In investigating this question, we come across some other insights about judging as well.

Data Gathering

The HWS Round Robin (“HWS RR” hereafter) is an elite debating competition that invites 16 of the best debate teams and about 16 highly regarded debate judges from around the world each year. To be more precise, of the 16 judges in 2014, when this research was conducted: 13 had broken as a judge at Worlds (the other 3 had never judged at Worlds, but had accomplishments that would no doubt warrant them being invited as subsidized independent adjudicators); 4 had been Worlds grand finalists or had won the ESL championship; 5 had judged in Worlds semis or finals; 1 was top speaker at Worlds; 1 was a Worlds DCA; and 2 were Worlds CAs. Of course, this leaves out countless judging credentials outside of the WUDC. Suffice it to say that this is an exceptionally strong set of judges. The judging pool was 25% female. A total of 6 nationalities were represented.

Over the course of 5 rounds, each team debates every other team exactly once. Judges are allocated such that no judge ever sees the same team more than twice and two judges are never on the same panel more than once.

In 2014, we ran a research study on judging by adding a panel of “lay judges” to each of the preliminary debate rounds. We recruited 40 people who had had no prior experience with competitive public speaking. These lay judges were recruited from faculty, staff and academically high-performing students at HWS. All lay judges were given a very brief (about 30 minute) orientation to judging BP debate, which was as neutral as possible regarding what constituted good debating. (See Appendix A for a summary of what was said at this orientation.) The primary purpose of the orientation was telling them what we were asking them to do and to encourage them to set aside any preconceptions about competitive debating.

The lay judges were assigned to rooms in panels of 3, with 1 person randomly designated as the chair. These people watched their assigned debates silently, as typical audience members would. After the debate was over, they were moved to another room and given 15 minutes to come to a decision about the debate, consulting with no one else. But, before discussing the debate among the panel, they were instructed to write down their initial call on a slip of paper, which we then collected. After the lay judges came to a decision (by consensus or vote), they filled out a ballot indicating team ranks and individual speaker points. In a few cases, there was more than one set of lay judges in the room, and in these cases, they deliberated entirely independently.

The pro judges stayed in the room after the debate and came to a decision, just as a panel ordinarily would. The only difference was that pro judges were also instructed to write down their initial call on paper that was collected.

Almost all of the 20 preliminary debates were video recorded and almost all of the judge deliberations were audio recorded.1 This paper will not discuss any of the information from these recordings, though we hope to engage in some careful qualitative analysis of those recordings in a future publication.

All the quantitative data was entered into a spreadsheet and analyzed using the methods described below. This included:

  • Pro judge panel ballots (including speaker points)
  • Lay judge panel ballots (including speaker points)
  • Individual pro judge initial calls
  • Individual lay judge initial calls


A central element of our analysis concerns comparing team rankings provided by individual judges and panels of judges. To do this, we developed a method of measuring the degree of difference between two complete rankings (i.e., ordinal rankings of all four teams). The difference between two complete rankings can be measured on a scale from 0 (representing an identical ranking) to 6 (representing a maximally divergent ranking). A complete ranking can be translated into a set of 6 bilateral rankings, comparing each possible pairing of teams out of the four teams in the room. Each bilateral ranking was scored as a 0 if the two complete rankings agreed on which of those two teams should be ranked higher, and was scored as a 1 if they disagreed. These six scores were then summed to provide the final divergence between the two complete rankings on the 0-6 scale. 2 So, the least divergent rankings (other than full agreement) would be a situation where the rankings are the same, except for two adjacently ranked teams being switched. See the examples below:

Screenshot 2014-12-23 17.35.28 We also wanted to measure how similar the initial calls from an entire panel were. To do this, we simply created three pairs of complete rankings from the three judges, calculated the divergence for each of these pairs, and then summed these. This gives a scale from 0 (no disagreement) to 12 (maximum disagreement).3 To make this easier to grasp, consider the table below, where the “call difference” is the degree to which the three judges calls differed.

Screenshot 2014-12-23 17.35.34

We used averages of these measures to answer the following questions:

  • Did pro or lay panels show greater differences in their initial calls?
  • Did pro or lay judges tend to alter their rankings more to arrive at a final call?
  • How different were pro and lay panel rankings from each other?

To test for statistical significance of these differences (between A & B), we used a t-test for sample means, controlling for unequal variances: Screenshot 2014-12-23 17.35.41

We tested the following hypothesis to determine the likelihood that the differences were random:

Screenshot 2014-12-23 17.35.46

As a point of comparison, we sometimes include what a set of random rankings would look like. To generate this random data for initial call differences between a panel, we numbered all 24 possible rankings for a BP debate, then we used a random number generator in Excel to create three independent random numbers from 1 – 24. We then calculated the call difference of those three rankings and recorded it in a spreadsheet. We did this 100 times and used that sample as our random data for call differences. To get a “random” data distribution for the divergence between just two rankings, we calculated the divergence of the ranking (1,2,3,4) against each of the 24 possible rankings and used that as our “random” distribution. Although not generated randomly, any arbitrarily large set of paired rankings (each randomly selected) would converge on this distribution, so it should more than suffice as a stand in.

Findings & Discussion

Based on our analysis, there are five areas that we want to discuss: 1) the correlation between lay and pro judges regarding team point decisions; 2) the relative similarity between the initial calls of the two kinds of judges; 3) the movement between initial calls and final decisions for the two kinds of judges; 4) situations in which we placed two lay panels in the same room; 5) judge bias toward particular positions in the debate.

Similarity of Final Decisions

The data clearly shows that there is a correlation between the winners chosen by the lay judges to those chosen by the pro judges. It would have been both horribly depressing and a damning indictment of our activity if this had not been the case.

At the same time, we want to note that the break would have looked very different if the lay judges had been deciding the winners. The top breaking team would not have changed and the second team would have squeaked in as 4th seed (on a tie-breaker), but the other two teams who broke to finals would have been 8th and 9th on the tab. What stands out to us in the comparison of the results from lay vs. pro judges is that there were 3 teams whose total team points from the two groups differed by 5 or 6 over just five rounds. An additional 4 teams had results differing by 3 or 4 team points. Putting this another way, there were 2 teams that the pro judges liked much more than the lay judges (5 points), and there were 3 teams that the lay judges liked much more than the pro judges (4-6 points). The average difference for a team at the end of five rounds was 2.625, which is substantial, since the average point total is 7.5. The chart below shows the different results from the two sets of judges. Team names were alphabetized to show the order in which they would have finished by the lay judges’ rankings (i.e., “Team A” would have broken first, “Team B” second, “Team C” third, etc.).

Screenshot 2014-12-23 17.36.30 Although there is a correlation between the two sets of results, there are some significant aberrations. In fact, the differences between the two sets of results are greater than the chart above suggests, since the data presented in the chart above only considers the teams’ final score, not the accumulated variance in the decisions from each rounds.4 So, the chart below may better represent the amount of disagreement between lay and pro judges. What is striking here is that even in cases where there appeared to be strong agreement on the results (e.g., teams A, E, G and L), the reasons for that result were very different, varying by 4-6 points in these four cases. Though we did not represent it in the chart below, the expected accumulated differences between any set of team rankings and a random set of rankings is 6.25 for each team over five rounds. So, they two sets of judges are coordinating better than random, but that’s not a high bar.

Screenshot 2014-12-23 17.36.35

Clearly, the pro and lay panels saw some debates very differently, and the quantitative data that we have will not answer the question of why this is the case. Our intention is to move forward with this research by engaging in a qualitative analysis of the audiotapes that we have of the deliberations of pro and lay panels, particularly in the rounds where they disagreed markedly.

While the accumulated differences shown above make it seem as though the decisions by the lay and pro panels were quite substantial, things look somewhat different when we view the data in a different way. We calculated the divergence between the rankings of the pro and lay panels in each round and found the distribution of these. For comparison, we added what a distribution of divergences from random rankings would look like and we also added the distribution of divergences from individual pro judges on the same panel at this tournament.

Screenshot 2014-12-23 17.36.40There is only the smallest possible divergence or no divergence at all between the lay and pro panels in 44% of all cases. In another 28% of cases, there was a divergence of 2, which we still consider a fairly similar ranking. Although a divergence of 3 or 4 is definitely substantial, it is important to note that there were no cases where the calls of the two panels diverged as much as 5 or 6. The lay panels diverged from the pro panels slightly less than the individual pro judges on the same panel diverged from each other.5 This suggests that there is not such a big difference between how pro and lay judges see the debates.

Similarity of Initial Calls

Regarding the differences in the initial calls of the judging panels, the data reflected what we expected to see, but not to the degree that we expected. Pro judge panels tended to be more consistent (i.e., less divergent) than lay judge panels. However, there was a greater average difference than we had expected between the initial calls of the pro judge panels. In other words, we expected the pro judges to agree even more before the panel discussion began.

Screenshot 2014-12-23 17.36.46 In the 20 prelim rooms, there was never a case where the pros completely agreed, though perhaps that isn’t quite as remarkable once you consider that the odds of this agreement randomly happening are 1 in 576. In 47% of the rounds the pro panel’s call difference was minimal, meaning that it was either 2 (the smallest possible difference, outside of complete agreement) or 4 (the next smallest). We see these differences as relatively minor, and indicative of a panel being largely on the same page at the end of the debate. These are situations that would likely set the stage for a fairly easy deliberation. We consider call differences of 6 to be moderately divergent. Although panels with call difference of 6 will find it somewhat more difficult to reach consensus, there will be some clear commonalities in the three judges’ rankings that can help to find a path to consensus. About 24% of pro panels fell into this range. We consider panels with a call difference of 8 (12% of pro panels) to be significantly divergent. These panels will likely struggle to find commonalities in their rankings, though some will likely exist. We consider call differences of 10 or 12 to be extreme, since these calls indicate virtually no agreement. In 18% of pro panels, there were such extreme differences. We feel sorry for the people engaged in these deliberations. Of course, we acknowledge that in some cases, these extreme call differences can dissipate quickly once the panel resolves one or two central questions about the debate. But, many times this is not what happens.

In contrast, the lay judges had minimal or no call difference (0-4) in 28% of their panels, many fewer than the pro panels. About 32% of lay panels had a moderately divergent call difference of 6. About 12% of lay panels had a significantly divergent call difference of 8. The remaining 28% of panels had extremely divergent call differences.

We note that even with panels of uniformly excellent judges, about 30% of panels will disagree to a significant or extreme degree in their initial impression about who won a debate. This fact strongly suggests that even the most confident judge among us should cultivate a sense of humility regarding their call in a debate. This is even clearer for those people considering criticizing a decision without participating in the deliberation process.

The average call difference for lay panels was 6.5, with a standard deviation of 3.38. This compares to an average of 5.9 for pro panels, with a standard deviation of 2.87. An average random set of three rankings had a call difference of 8.9, with a standard deviation of 2.67. We had expected that pro judges would have a certain uniformity of expectations and criteria and that this would result in more uniformity in their initial call. While our findings were not strictly inconsistent with this, as mentioned above, we had expected to find a larger gap between the pro and lay panels in this respect. The gap we found was not even statistically significant.6

The size of these call differences suggests that all judges should remember that the panel deliberation is a essential element in coming to a good decision and that judges (chairs in particular) should not see their job in the deliberation as ensuring that the other judges are willing to go along with their initial call.

Movement from Initial Calls to Final Decisions

We use the term “movement” to refer to how much a judge’s initial call diverges from their panel’s final call. When there is an initial call difference among the judges on a panel, there will necessarily be some movement by some of the judges. But panels will not always come to a final decision that minimizes how much the judges move. For better or worse, in practice, panels sometimes engage in deliberations that cause everyone on the panel to change their mind about a ranking that they had all agreed on. So, the existence of an initial call difference sets a minimum amount of movement that needs to happen to reach consensus, but judge movement can significantly exceed this. In theory, a panel could start with complete agreement (i.e., no call difference) and end with a call that is completely different from what everyone initially thought. So, movement measures something new.

Given that pro judge panels had a lower call difference on average, one might reasonably expect that the pro judges would tend to move less than the lay judges. However, the data showed that the lay judges moved an average of 1.3 between their initial call and their final judgment, with a standard deviation of 1.41. The pro judges moved an average of 1.8 between their initial call and their final judgment, with a standard deviation of 1.43. This difference of .5 is both statistically significant and potentially revealing.7 One might try to explain the fact that lay judges moved less than pro judges, by focusing on the 2 lay panels that agreed immediately, but this only accounts for 6 of the 27 instances of 0 movement. Moreover, these 2 panels with 0 call difference equally affected the call difference average, and so cannot really explain the fact that lay judges moved less even though they started by disagreeing more.

Screenshot 2014-12-23 17.36.52The chart shows that lay judges most frequently do not move at all from their initial call, and their tendency to move tappers fairly steadily as the divergence increases. In contrast, the pro judges were about equally likely to move 0, 1, 2 or 3 degrees to the final panel decision, but the likelihood that they would move more than 3 drops precipitously. It is unclear if this precipitous drop is just a statistical aberration based on our small sample size or if there is a real cause to why pro judges are dramatically less likely to move beyond 3 degrees of divergence.

One possible explanation of why the lay judges had a smaller average movement despite starting further apart is that lay judges were more conciliatory and attempted to minimize the degree to which the panel members needed to move by being willing to compromise (i.e., split the difference). This is only one possible hypothesis and we make no judgment about whether a conciliatory attitude is beneficial to judging or not. It is possible that discussions among the pro judges revealed deeper insights into the debate that caused many people on the panel to reevaluate their initial calls. It is also possible that pro judges attempted to do this, but actually just ended up distracting themselves from their more accurate first impressions. Perhaps our future qualitative analysis of the deliberation recordings will shed some light on this.

Do distinct lay panels come to similar conclusions

As we said above, at the start of this research we anticipated finding that pro judges were more internally consistent (less divergent) in their rankings, both as individuals and as panels. The analysis of call differences suggests that as individuals, pro judges are more consistent with each other than lay judges are. However, we have no direct evidence about the extent to which different pro panels would be consistent. Our data did provide us with some modest evidence about consistency between lay panels because we had enough volunteer lay judges during some rounds to put two lay panels in the same room. We were able to do this five times and the results seem worth reporting.

Screenshot 2014-12-23 17.36.57There was a high degree of consistency between the two lay panels in these five rooms. In two rooms, they were in perfect agreement. In two others, they had the smallest degree of divergence and in the final room, they diverged by 2 still were largely in agreement. So, the average divergence between 2 lay panels was 0.8. In contrast, the average divergence between these panels and the pro panels that were in their respective rooms was 1.4. The sample size is too small to determine statistical significance, but it seemed to us that it was worth remarking on.

Judging bias towards debating positions

We looked at the data on how well the various team and speaker positions did according to the points that the two sets of judges awarded them. The clear trends in the data are:

  • Closing opposition teams were likely to do better with both pro and lay judges
  • Opening government teams were likely to do worse with both pro and lay judges
  • These biases were more pronounced with the lay judges, especially the preference for closing opposition teams.8

To provide a frame of reference, we compare our data from the 2014 HWS RR with data from the past seven years of the HWS RR, and also with the data from the 2014 WUDC in Chennai. We compared these by adding up all the points won by teams in each position during the preliminary rounds of these tournaments and then calculating the percentage of the total points that this represented.

Screenshot 2014-12-23 17.37.01

The result was both interesting and remarkably boring. The results are boring because all of these sets of judges award points in basically the same zigzag pattern. But, the results are interesting partly because there is this consistency, and particularly because the lay judges not only replicated this pattern, but did so in an exaggerated manner. This strongly suggests that the bias in favor of opposition teams (and against the opening government team) is not a function of some set of habits or expectations developed within our debating community, but rather is an outgrowth of something about how the nature of those positions relates to an audience.

As a final note, we hope that this short publication will spark discussion about these issues and will also prompt people to suggest new ways for us to analyze the data that we have at our disposal.

Limitations & Directions for Future Research

There were several limitations on our research.

  • Obviously, with only 20 preliminary rounds, the data set we are working with is a fairly small sample size.
  • Because the HWS RR has such an unusually high caliber of debaters and judges, one might question the extent to which we can generalize to more typical debates.
  • Because the HWS RR uses team codes, the debaters schools were anonymous with lay judges (who were also unaware of particular debater reputations), but most of the judges were likely aware of who all (or almost all) of the debaters were.
  • A small amount of our data needed to be discarded because forms were incomplete or not filled out correctly (e.g., a judge would fill out the initial call sheet without giving each of the four teams a unique rank from 1-4).
  • There were no rooms with more than one pro judging panel, so we are unable to determine the consistency between pro panel decisions after deliberation.

As mentioned above, we plan to pursue further qualitative research based on the audio recordings made at the 2014 HWS RR. This will hopefully provide a significantly more textured and nuanced view of what was happening within the deliberations of panels with the two kinds of judges.

Conducting this research again at the HWS RR or at other tournaments could increase the sample size. Additionally, it would be fascinating to gather more data on the consistency of pro panels. On possibility would be to hold a (presumably small) tournament where each room had two pro panels. Teams would simply accumulate points from both panels. This would be very simple to do in a round robin format, but would also be possible in more traditional formats, though it would need to be hand tabulated (or software would need to be developed). Such a tournament could provide a wealth of useful data about how consistent judging panels are.

Appendix A: Instructions to Lay Judges

The handout below was given to all volunteer lay judges along with an explanation of each point on the handout. Volunteers had an opportunity to ask questions as well. All volunteers were screened to ensure that they had had no previous exposure to any form of competitive debating.

HWS Debate Research

Before you start:

– Please set aside everything you think you know about what competitive debate should be like.

– We are interested in your perspective as an intelligent and thoughtful listener.

– It is not easy, but please do all you can to set aside your own personal biases and beliefs.

– Try to forget whether you actually agree with one side or the other.

– Try to forget any particular pet theories that you tend to favor.

– Try to adopt what you take to be the bland beliefs of a typical, intelligent, educated person.

– If an ordinary, intelligent, educated person would accept or reject a claim, you should too, regardless of whether other debaters refute it.

– Ask yourself: Who would have persuaded me most if I really were an unbiased person?

This is a contest of who is best at rational persuasion, not a contest of who presents the most eloquent speech. Obviously, good speaking style helps one persuade an audience, but we are asking you to judge what would actually persuade a rational, intelligent and educated audience. This is a holistic judgment that is not exclusively about style or content. The question is “Who was most persuasive?” and we offer no formula for coming to that decision.

Things you must know:

– There are 4 teams competing in each debate.

– The 2 teams on the left are supporting the plan or proposition stated by the first speaker.

– The 2 teams on the right are opposing this plan or proposition.

– But, judges do not declare either “side” of the debate (i.e., either “bench”) the winning side.

– Rank them “Best”, “Second”, “Third” and “Fourth” based on how persuasive they were.

– Which team, considered as a whole, was most likely to ACTUALLY persuade an unbiased, intelligent and well-educated audience.

Before your panel begins its discussion, please take just one or two minutes to write down the ranking of the teams that you (on your own) think is most appropriate. But, after this, please be willing to revise this ranking if the discussion actually makes you see things differently.

– Judging a debate is a COOPERATIVE exercise. DO NOT VIEW THIS AS A COMPETITION to convince the others that your initial impression is correct. The goal is to work together to find the best answer to the question of which team was more persuasive of an intelligent, educated and unbiased audience.

– After coming to a decision on the team rankings, we ask that your panel assign points to each individual debater on a scale of 50 (poor) – 100 (excellent). These points should reflect the speaker’s overall contribution to persuading an intelligent, educated and unbiased audience that their side is correct.

– So, this includes quality of argumentation and quality of style.

– The average points at this tournament are typically about 79.

Things you should know:

– One person in each panel of 3 judges has been assigned to be the “chair”, which means only that they keep an eye on the time and try to ensure that the deliberation moves along so that your panel is ready to render a decision at the end of 15 minutes about how all 4 teams ranked.

– The 2 teams on the same side need to (largely) agree with each other.

– Disagreeing with a team on the same side is called “knifing”.

– This is to be considered a negative exactly to the degree that it undermines the overall persuasiveness of their side’s position. (So, a small disagreement about an unimportant element can be mostly ignored.)

– The debate is about the main proposition articulated by the first speaker, which may be somewhat more specific that the general ‘motion’ (i.e., topic) announced before the debate.   Focus on the proposition, not the motion.

– You are permitted to take notes, but you are not required to do so.

Things you might want to know:

This is a guide to some unfamiliar terminology that might be used in the debate. Below are the names of the various teams (in the outside columns) and names of the individual speaking positions (in the inside columns):

Screenshot 2014-12-23 17.37.08

– During the middle 5 minutes of a speaker’s 7 minute speech, debaters on the other side can stand up for a point of information (POI). The speaker can either accept or turn down these POIs, but typically they are expected to accept 2 during each speech. The perception is that failing to do this demonstrates a lack of confidence.

  1. There were some technical difficulties that prevented recording in some of the debates.
  2. In other words, for any two ordinal rankings of four teams in a room (e.g., CG/OG/CO/OO and OG/CG/OO/CO), we asked the following six questions: Did they agree on whether OG placed above OO?; Did they agree on whether OG placed above CG?; Did they agree on whether OG placed above CO?; Did they agree on whether OO placed above CG?; Did they agree on whether OO placed above CO?; Did they agree on whether CG placed above CO? Using the example just given, the answers would be: yes, no, yes, yes, no, yes.

    Answers of “yes” were represented with a 0, while answers of “no” were represented by a 1. So, in the same example, the answers were represented as (0,1,0,0,1,0). The sum of these represents the divergence between two rankings. So, in this example, these rankings diverge by 2 degrees out of a possible 6.

  3. Only even numbers are possible on this scale, but we chose not to simplify it to a 6 points scale in order to make it more obvious when we were talking about comparing panel rankings, as opposed to bilateral comparisons between rankings.
  4. In each round, the difference between the team points given to particular team by the pro judges and the lay judges will be somewhere between +3 and -3. The “accumulated variances” for a team is the sum of the absolute values of all individual divergences in the five prelim rounds. The “final difference” is the sum of these values (not the absolute values). So, for example, imagine that a team got the same points from the pro and lay judges in the first three rounds, then in round four got ranked 1 point higher by the lay judges than by the pro judges, and then in round five got ranked 2 points lower by the lay judges than by the pros. That team would have a final difference of 1, but an accumulated difference of 3.
  5. Comparing the lay and pro panel divergence to the divergence between individual pro judge rankings and their pro panel rankings would not be useful, because those are not causally independent rankings. Below, we do discuss the distinct issue of how much judge rankings move from their initial call to the final panel ranking.
  6. The t-statistic for the test on this data turned out to be 0.6124. Given this statistic, we fail to reject the null hypothesis that the average difference in initial rankings for lay judges and debate judges are equal.
  7. The t-value = 3.33, significant at a 99% confidence level (i.e., a significance level of 0.01), found by t-test for sample means controlling for unequal variances. Given this statistic, we do reject the null hypothesis that the average movement for lay judges and debate judges are equal.
  8. We are not using the word “bias” in a pejorative sense. We mean it merely in the statistical sense.

Building the Narrative

Andrew Gaulke

Written by Andrew Gaulke
With excerpts from an interview with Tim Sonnreich

We all like to think that we’re immune from being manipulated. We all think we’re too smart for advertisers, or we are too smart for people who play reverse psychology on us, or whatever it might be. Actually those things exist for a reason, and we are, to a greater or lesser extent, influenced by them.

Debaters and adjudicators like to think that the formal logic of the argumentation is what will win debates for them. However, there remains an element of persuasiveness that is just as powerful, yet much harder to grasp. Every debater knows the experience of adjudicators who seem to totally misrepresent the arguments they made. Every adjudicator knows the experience of finding one team more persuasive without quite knowing what argument made them win. This article explores one element of that additional realm of persuasiveness, the construction of a narrative out of argumentation.

Much of what this article explores is intuitive, and much of its advice is already in practice in high level debates. My aim here is to provide some theoretical ideas as to why we debate the way we do, as well as providing some clarity to help new debaters understand case construction and persuasiveness on that level.

I want to begin by nailing down exactly what it is I’m talking about in this article. A speech in a debate can be understood two ways. The first is a philosophical understanding of how the speech functions. In this type of analysis you can extract a set of logical ideas from the speech, and those ideas tell you what the speech was trying to say. You can change the order of the parts, you can change the specific wording in a lot of places, but the logic will remain the same. A philosophical understanding of the speech that focuses on formal logic wouldn’t change its assessment of how that speech worked based on structural or cosmetic changes that do not substantially change the logic that is presented.

That is not the complete story of how human beings respond to a speech, however. I intend to introduce a second, literary understanding of how a speech operates. The structure of our argumentation is important, and should be analyzable. The words and rhetoric we use are important, and should be analyzable. A literary understanding of how a speech operates incorporates those intangible elements of persuasion. It understands the total effect of a speech on those who hear it instead of breaking the speech down into its individual logical components. It understands how each element contributes to the whole.

It takes a lot more than just good arguments. You can see that with a lot of ESL teams, where they basically make the same arguments that everybody else makes, but sometimes it’s a language issue, or sometimes it’s just about the way they package and present their arguments, that let them down. People take a lot of subtle cues from the way people present, in a manner sense, that hold them down. Or just the way they position themselves in a kind of rhetorical sense that can leave them out. It’s not so much that they don’t have the words for it, but it’s that they miss the chance to build momentum and persuasion in what they’re saying and just kind of jump to ploughing through the arguments.

This is why the idea of a narrative is so important. A narrative, at its simplest definition, is the way we understand how two pieces of information relate to each other. When we read a novel we understand that the events of the story are connected to each other, not only on a causal level, but also thematically and conceptually. The events can build up into a broader understanding of what a particular novel is trying to communicate.

The same thing is happening when we construct a speech in debating. When an adjudicator hearsa speech they are trying to construct a coherent meaning out of it. They want to know what the speech is about. They are trying to understand what core idea is at the heart of a particular speech because that is how we unconsciously code information. Understanding this process provides a powerful way for debaters to craft their speech for the most persuasive effect.

One of the best articulations of this comes from Tim Sonnreich’s “First Principles” approach to debating. This is the idea that the most persuasive cases are constructed based on particular political ideologies that can form a core principle in the case. In other words, the persuasiveness of a case comes when the individual arguments add up to something more. They are connected by the ideology behind them, even when that ideology is unspoken.

For example, a common set of arguments in debating relate to how far the government should be allowed to control the choices of its citizens. When debaters argue that a particular policy unjustifiably infringes on people’s lives they do not deploy that one argument in isolation against one policy idea in isolation. They are instead trying to construct a particular understanding about how the world works and show that, based on that understanding, we ought to set broad criteria on government intervention.

That broad understanding of the world is the narrative that unifies the meaning of the separate pieces of logic. A typical antigovernment intervention case might run as follows: first, there is an inherent value in individual freedom; second, governments are bad at making decisions for individuals; third, cultural and economic problems are solved best in unregulated environments. All of that material builds into a picture of what the world looks like. It is connected by that image of the world (in this example, a libertarian image of the world) even when those connections are not explicitly drawn by the debaters. Often those connections are not explicitly drawn, as the three themes of this broad superstructure are usually labeled based on the individual character of the topic at hand.

And when I talk about the narrative, it’s about being able to give a compelling story about what the world looks like in your mind that’s different than the world we live in today. And whether that’s a small change or a big change, and what the reason is for that change, and how that fits with the way we currently think now, and all of that helps. Because ultimately in a debate – you know we talk about whether we proved such and such a thing in the debate. You never actually prove anything, or very very rarely do you prove anything in a debate. What you do is provide enough justification for the proposition you’re making that an average reasonable person has a willingness to believe that it’s probably right, without having gone to do any more thinking about it.

The reason this idea is important is because all the logical argumentation of a speech is understood based on how it relates to the core idea that the adjudicator understands the speech to be about.
One of the simplest ways to understand this is in how an opening government team constructs the problem they are trying to solve. When a PM’s introduction focuses on a particular problem an expectation is created in the mind of the adjudicator that the material that follows is intended to address that problem. When debaters flag their main themes it similarly creates expectations as to how the case being presented will understand the world. The structural signaling is something all debaters intuitively know to be important, and the reason it has grown to be so common in debating is because it provides the broad idea of the case so that adjudicators can link each individual element of logic back to that understanding.
And the reality is, particularly in a high level debate, you actually prove a lot less than you think you do, because you end up covering a lot of ground. There are a lot of arguments, and a lot of complexity to them, and just being able to explain your opponents argument, before then going on to deconstruct it takes a lot of time. So what you’re really doing with the narrative is, you’re giving the adjudicator the broad brushstrokes of what the final goal looks like because that helps them join the dots for you. They can see where the starting point is because you described that at the start, and they can see the end point, and then your arguments helped them to see that there’s a path between them and if you do enough to show where those paths are then you win.
What often separates top level debaters is the clarity with which they communicate this hidden core idea of their case. They will use their introductions and conclusions carefully in order to make their position clear.

Understanding that this core narrative exists behind the logical argumentation is especially helpful in organizing rebuttal to a case, and in particular the strategic selection of rebuttal. Effective rebuttal is not only trying to attack a particular logical point, but also trying to break up the coherence of the world view the opposition is attempting to create, while at the same time re-enforcing the debater’s own broader narrative of the world. It is impossible to effectively rebut an argument without understanding the role that argument plays in the debate. When, at a high level, arguments are well constructed and will be persuasive to a certain extent no matter how they are rebutted, it is important to see the way that the argument connects to the broader persuasive narrative the team is building. That is how you can know the extent to which you need to attack the argument in order to overcome it.
You don’t really have to worry about what the logical fallacy is in the argument because again well-constructed arguments don’t have logical fallacies, but no matter how well constructed the argument is, there are philosophical alternatives to that argument because it is a debate.
This is why First Principles analysis can be particularly persuasive. When a case is constructed based on a First Principles position that case will have an internal consistency and clarity that allows for a coherent narrative. Additionally, the case will be based on often familiar narratives that the adjudicator already knows and understands. An adjudicator is likely to already understand the idea of a libertarian world view, and as a result that idea will be particularly clear to them in its construction. The type of world, and how that world functions, will be that much easier to imagine.

When a team tries to describe a libertarian world they are able to do it more efficiently because the thread of meaning that connects the individual pieces of information are easier to imagine. All cases have gaps as a simple necessity of limited speaking time, those gaps are filled in the adjudicators mind based on the narrative the speech constructs about the world, the meaning that connects the information. An argument is harder to attack when the adjudicator is doing the work for you by filling in those gaps themselves. A strong, coherent narrative encourages them to do that.

Debaters need to understand that their use of language, introductions, conclusions, and the choice of focus in their speech contributes to how the adjudicator will understand all of their material. When done well a strong internal narrative can be an excellent tool for debaters, when done poorly that can be exploited by the opposing teams whose narrative of the world is stronger.

So when I talk about narrative in a first principles sense what I’m really saying is, you can give people the image that you are trying to create, even before you get to describing the arguments for why that’s a good image, so that you can then rely on and lean on that throughout the debate, to join the dots, make more cohesive what is really a bunch of singular arguments which you have chosen because given the amount of time and space you have, they’re the best choices you’ve got to explain them, whereas in a different format you might chose a totally different way of explaining something.

When both sides of the debate are able to effectively communicate their narrative about how the world works then the clash in the debate becomes, at least in part, about the strategic choice of narrative from each side.

Both teams are trying to position themselves strategically in relation to the overall clash of the debate. I have so far talked about building a narrative within individual speeches, but it is important to remember that the debate itself will also have an overarching narrative. When an individual makes a speech they are creating the narrative of their own case, while at the same time contributing to the narrative of the debate as a whole.

For example, an individual speech may have the core narrative “individual liberty maximizes happiness and utility”, and their opposition responds with “this problem is too complex to be solved individually”. What that contributes to is a story of the debate that revolves around a small government versus big government clash. That is the central issue the adjudicator wants resolved because it seems like the most important issue in the debate. Even if a logical analysis of the debate shows that issue to be only one equal part of the debate, the sense of its importance places it in the forefront of the mind of the adjudicator. That adjudicator is likely to preference the resolution of that issue over the resolution of other issues in the debate.
I think certainly the advantage of being the government team is you set the structure for the debate in a logical architecture kind of sense. The most common thing for every team to do is for their rebuttal points to just graft on to the method structure of the government team. Their first point was role of government so our first point of rebuttal is. Then their second point was how this affects women so that will be our second point of rebuttal and whatever. So you do get a huge advantage because most teams are too lazy or too time poor to change the frame, and so they’ve all just mapped themselves onto you where that seems like a reasonable option.
There are a number of important implications of this. Firstly, the relevance of a team to the debate is directly tied to how they are perceived in relation to that core narrative of the debate. This is because when we try to remember and understand information we are trying to give that information a coherent meaning in relation to other pieces of information we have received. When an adjudicator has already established the relationship between other pieces of information in the debate any new piece of information is automatically judged in relation to that older narrative. For an opening half team this can help them dominate the closing half by making sure that the narrative of the debate focuses on their own material. By showing that the core thread of the debate is about their arguments then closing half teams seem less relevant to the way the debate progresses. By creating a powerful and coherent narrative of the debate, the opening half has engineered an expectation from the adjudicator that that narrative will be addressed and resolved in each new speech, making it difficult for closing half teams to move away from that material.
Secondly, closing half teams need to be able to understand that that expectation is on them. If they do not feel they are able to extend adequately within that narrative of the debate they will need to be able to move the debate onto a different narrative. Introductions and points of information are particularly useful in doing this, as they serve as structural signaling points. If a team wishes to move the debate away from a clash on big government versus small government and move into a debate about , for example, consent, they need to understand how that will be understood in relation to the first clash in the debate. They need to spend more time justifying those new arguments in the debate than is strictly logical. In their rebuttal they need to show why the arguments form the opening half should be judged in relation to the issue of consent, rather than simply attacking their truth.

In narratology this is the difference between primary and supplementary events. A primary event is something that is considered core to the narrative, whereas a supplementary event is something that is tangential to the narrative. A supplementary event can have a relevance to the narrative, but it is not one of the key ideas the narrative turns on. The way we remember the events of a narrative, and assign importance to them, deprioritizes supplementary events in favor of primary events. Closing half teams in particular need to find ways of subtly placing their openings material into the position of supplementary events. People naturally do not like unresolved stories, and portraying the opening half clash as irresolvable is often a good strategy for moving the debate into another narrative that the adjudicator will inherently favor. Another tactic can be to play up a deficiency in the opening, making the adjudicators instinct for narrative resolution desire the filling of that gap. The point of this is that closing teams need to start manipulating the expectations of their adjudicator into wanting the particular contribution to the narrative of the debate that the closing team wants.

The third implication of a narratological reading of debates is that when the narrative coming from one team does not fit well with the narrative of the debate as a whole then that weakness should be exploited. If the core issue of the debate is clearly still about big versus small government while one team is talking about consent, their opposition can paint them as being irrelevant to the debate. They can use their language and how they focus their rebuttal to try and portray that team as being entirely focused on the issue of consent, even if they did address other issues in the debate. Similarly, if a team can successfully move the debate onto consent, they can re-enforce their opening as having a narrative entirely about small government. By doing that, even if opening had material on consent, it will feel like a supplementary event from that team, and so the importance of that material is attached to closing.

Finally, when a speaker is constructing the narrative of their own case they should be doing so knowing that their opposition sill be attempting to control the narrative of the debate, and should attempt to predict what sort of clash is likely to become central to the debate. When the debate is intuitively about a big versus small government clash a team will find it easier to make that material central to the debate than arguments about consent. More pertinently, when constructing a nuanced narrative of their case they should do so in a way that engages with what is likely to be the core clash of the debate.

An un-nuanced case based on small government might be “Freedom is good”, but that only barely glances at the clash of the debate. An opposition is likely to be able to overcome that with a case constructed around the idea “freedom is good, but less important than social harm”. That more nuanced narrative doesn’t simply state a position, it also places that position in relation to the narrative of the debate. If the adjudicator leaves the room thinking that those are the ideas that sit at the heart of the two teams it is immediately obvious which one they would preference. Even before they analyze the logic of the arguments they are put in a position where they would need to be persuaded out of giving the win to the opposition. A better narrative for the small government side might be “This particular freedom is so important it can never be compromised on”, or “freedom creates the conditions for solving social harms, not government intervention”. These cases engage with the clash of the debate from the very first speech, and don’t give the opposition the advantage.

The process of understanding this in prep time is often very difficult, and is likely to be unique for each speaker. One way that can be effective is through simply acknowledging that this is a tool that is available to use for persuasion, and pushing yourself in prep time to have a larger coherence to the case that can be a strong narrative. Simply aiming for that conceptual clarity of narrative when writing and delivering arguments can give debaters the instincts to push for that narrative in their speech. A second technique is to imagine how you want the debate to end, instead of simply how you want it to begin. In three on three styles of debating, it can be helpful to imagine how you think the third speaker is going to approach the debate when they have to incorporate both sides of the case into a holistic speech. Doing this in prep time allows you to create material that deals with particular arguments in a strategic manner, instead of simply as units of logic.
Effective case construction is about understanding the status quo as a world view and understanding your vision of the world you want to live in, and then creating arguments that connect those two and give people a reasonable belief that we can and should transition in that direction, and I don’t think logic alone gets you there.
The key difference between the simplistic and the nuanced versions of those narratives is that one incorporates an actual strategy to win the debate. It requires the team to understand what is at the heart of the debate and what is at the heart of their opposition’s position. From there the team can think of a way to tailor their case so that they have an advantage when choosing and explaining their individual arguments.

Being able to position a case strategically requires understanding that a case builds into something greater than its logical components. Finding and exploiting that level of the debate is a crucial step in using prep time and speaking time most effectively for winning debates.


Narratological references

Abbot, H. P. (2008). Cambridge Introduction to Narrative. Cambridge: Cambridge University Press
Prince, G. D. (2004). Revisiting Narrativity. In M. Bal, Narrative Theory. Routledge
White, H. (1989). The Content of the Form. Johns Hopkins University Press
Gaulke, A. D. (2014) The Use of Narrative in Intervarsity Debate

Debating references

Sonnreich, T. (n.d.). First Principles. Retrieved from MAD Youtube:
Sonnreich, T. (n.d.). First Principles of International Development. Retrieved from MAD Youtube:
Sonnriech, T. (2014, October 13). (A. Gaulke, Interviewer)

The ‘Fairness Principle’ in Debating

Gemma Buckley and Josh Taylor

Debating is an inherently arbitrary, and therefore unfair, activity. However, much of the arbitrariness and unfairness is a result of how we currently choose to implement the rules and procedures around the activity. This article will propose a radical shift in the way we go about adjudicating debates, in hopes of achieving fairer outcomes.

Part One: Fairness matters

It might seem redundant to begin this article by establishing that fairness is of the utmost importance in awarding the result of a debate; hopefully all readers agree intuitively that the goal of debating should be to find a fair result. We can probably all acknowledge that individuals spend countless hours, huge sums of money and much emotional turmoil to hone their debating skills in pursuit of success. This reason alone is enough to suggest that we ought to ensure the right, or the most deserving, teams are rewarded.

However, it is also in the interests of the debate community as a whole to ensure fairness in the adjudication of debates. Debating as a sport gains its uniqueness and credibility by rising above many of the worst parts of political discourse – a platform which is most definitely unfair. Half the reason some of the brightest university students from around the globe dedicate their lives, often at the cost of other pursuits, to competitive debating is that it rewards merit and logic and creates constructive discussion. As such, there is no doubt that a frustration debaters face – something which no doubt drives people away – is the feeling that the results of debates are unfair. The idea that considerations beyond the skill and talent of a team will ultimately play a part in the result of a debate is enough to discourage effort and reduce the overall quality of debating. Why would any reasonable person expend resources on trying to make themselves the best debater they can be, when there is a good chance that their individual merit will not be a decisive factor in the result of any given round? The answer is of course that they would not, and we believe this is something regrettable for all who share a passion for debate.

As such, we would argue that establishing a level of fairness ought to be the number one priority for the debating community. For the purposes of this article we will advocate a view that fairness is achieved when the merit of each team is the sole consideration of the judge in coming to a decision, meaning that factors outside of the control of the team ought not be determinative.


Part Two: The way we assess debates right now is unfair

There are obviously countless ways in which the assessment of debates right now is unfair. Many former articles in this publication and others have dealt extensively with some of these issues, including gender, nationality and language biases. As such, this article will take issue with a very specific manifestation of unfairness, which can be described as unfairness in the substratum, or parameters, of debates. A judge ought to be believe that all four teams have an equal opportunity to get every result. However, often this basis is changing. This article focuses on situations where teams find themselves in substantially unequal positions due to a range of factors, which in application gives teams unequal opportunities to succeed.

Firstly, unfairness occurs when the very foundation nature of the debate is unequal. For example the topic itself sets up a much more difficult task for one team or side than the other, teams have unequal prospects of success.. Secondly, even if the motion has potential of fairness, actions of teams required to set up the parameters of the debate may create unfairness within the debate. For example, when a team squirrels or challenges a definition, teams within the debate may find themselves, through no fault of their own, in untenable positions. Thirdly, when the intuitions and biases of the judge establish and unequal playing ground, for example a judge is simply more receptive to arguments from a particular philosophical framework, teams may find themselves fighting from unequal ground. In this section we will deal with each of these situations in turn and make an argument that the status quo often prevents the most meritorious team from achieving success.

When debates are unevenly weighted.

It is (hopefully) a rare situation, but there are times when the very ground on which the debate is built is uneven by nature of the motion. Some ideas are simply good or bad. No matter what you believe about the capacity of debate to create logical arguments for anything, you must acknowledge that there are some cases where that is not true, or at the very least it is differentially difficult on one side or team. It is certainly very uncommon for something genuinely indefensible to be set as a topic (although not unheard of), but as with anything, there is a spectrum of unfairness. Topics that are counterintuitive or incredibly complicated may force a certain team or bench to do more work than another in order to convince an average judge. We would argue that even that is unfair.

The WUDC in Chennai provides a clear example of this. Six of the top nine ranked teams on the tab (teams ranked 2nd, 3rd, 6th, 7th, 8th and 9th to be specific) were all knocked out in the Octo-final round by lower seeded teams from the government side of the controversial motion “THBT Japan should shame its soldiers who participated in WWII, including those who did not commit war crimes themselves”. To be clear, it is not our argument that there is nothing to be said on the government side of the debate, nor is it our belief that higher seeded teams are always the most meritorious in a round. However, we would suggest that the overwhelming preference for opposition teams getting through, despite their seed, seems to suggest that the topic was far more difficult for those on the government. Some teams did manage to get through from the government side. However, that does not invalidate the claim that the task of the opposition was easier in this round. We believe that an unbalanced topic is one of the external factors which prevents merit from being rewarded.

When teams muddy the parameters of the debate.

Even when the topic is balanced and has sufficient depth to allow all teams an equal shot at winning, the actions of teams within the round can create a more difficult task for some. We are all familiar with the concept of squirreling, and it is true that we discourage it under the status quo. However, we also ask opposition teams to default to accepting the set up of the Opening Government team, so long as it is “debateable”. As it is now, Closing Government can only use a new definition if Opening Opposition challenges. In the instance where no challenge is offered by the Opening Opposition, the Closing Government is left with two choices – either defend the silly thing that their opening set up (something which the adjudication core did not intend) or be penalised for knifing. Through no fault of their own, this team is now in a more difficult position than other teams in the round, and has limited capacity to equalise the score. Similarly in this case, presuming the Closing Government choose to knife, the job of the Closing Opposition arguably becomes harder, with them having to oppose two different entire cases.

Again to offer an (albeit extreme) illustrative example from the authors own experience – a few years back in an out round at the Red Sea Open, the Opening Government misinterpreted or squirreled a motion and ended up running the opposition line rather than that prescribed for the government. The Opening Opposition, recognising that there was still ‘a debate’ to be had chose not to formally challenge the definition and instead offer some casual mockery. The Closing Government chose to run the topic as it was, therefore knifing the OG completely and also running the same case as the Opening Opposition. Finally the Closing Opposition were left to argue against two entirely contradictory cases, whilst also being incapable of defending their own for fear of agreeing with half of the Government bench. Obviously this kind of debate does not take place every day, but it does illustrate that sometimes the actions of a certain team create unfathomably difficult burdens for others. Much less egregious versions of this happen every day, when teams choose to block out other teams or simply make strategic errors that affect the capacity of the other team on their bench to win. We contend that the actions of team can create unfairness within the debate.

When the judge has prejudice.

Finally, we would suggest that the intuition and bias of the judge can artificially make one team’s job more difficult. Some judges have a preference for certain philosophical frameworks or simply believe particular arguments to be true. For example if a judge vehemently believes in a utilitarian framework, there is no doubt that any side of the debate that is mounting a rights based argument will have a more difficult time convincing that judge. This can often occur in debates that others may consider fair. An Adj. Core may value arguments on a side of a motion, but by virtue of landing a judge who rejects the same arguments, teams find themselves in substantially unfair positions. Regardless of which of these circumstances unfolds, it is clear that there is a pervasive unfairness with competitive debate.


Part Three: There are unacceptable deficiencies in our current approaches to solving unfairness

Obviously we are not the first people to notice or identify these problems, nor will we be the last. Many potential solutions have been discussed and implemented, however it is our belief that they have been hitherto ineffective at meaningfully resolving the issues of unfairness explained above

With regards to unbalanced topics, there have been moves to create more rigorous assessments of the success of topics, with tournaments offering motion balance analysis as common practice. However, of course there are limitations to this approach. In the first place, it does nothing to resolve the unfairness that teams at that tournament faced, although it may prevent unbalanced topics from being set again in the future. Secondly, it is unlikely to even do that, given there is no systematic approach to ensuring balance employed by most adjudication cores. The current approach seems to be to presume that adjudication cores know best and assert that all topics that are set can be won by any team. The fact is that this is wrong. This is in no way intended to be accusatory – the authors of this article are guilty of setting unbalanced motions too. However, it is important to recognise that setting unbalanced motions is always a possibility, given the way in which topics are set. Adjudication cores are predominantly made up of excellent and accomplished debaters, meaning their interpretation of what can be reasonably expected of teams is skewed. They also spend a long time talking about and thinking about their topics, leading to an echo chamber effect and also probably the belief that good cases are possible on either side – it is easy to forget that what is possible in days and weeks is near to impossible in 15 minutes. Forcing teams to suck it up and just try their best to overcome the inbuilt bias in a topic is unacceptable and frustrating.

To the credit of adjudication cores and the debating community, there have been greater attempts to resolve the issues of unfairness within debates, for example with moves towards mandating engagement through the POI rule. However, there is still a significant problem in the way we deal with squirrels and knifing. We ask the opposition team to simply run with whatever they are dealt, and penalise a closing team who choose not to defend something preposterous set up by their opening. Why should we choose to punish these teams and make their jobs more difficult through no fault of their own? We believe unfairness pervades these arenas of adjudication.

Finally, right now we attempt to deal with judge bias by asserting that judges ought not have bias. Of course we offering training, judge tests and feedback systems to try to weed out those with bias and demote them, however that does not really deal with the core problem. Bias will always exist, even amongst the best and most well enlightened judges, and no methods that we have right now are sufficient at overcoming that fact. Some of the current strategies can obviously be expanded upon: stricter selection of adjudication cores, more rigorous assessment of topic balance, mandatory POI’s and clearer rules about squirreling etc. may all help to limit the degree of unfairness within debate, but they will not eliminate it.


Part Four: The application of the ‘degree of difficulty’ metric will help to correct these problems

When all else fails, and by virtue of either the topic, actions of the teams or the inbuilt biases of the judge, the capacity of one team or bench to win on merit is reduced, the question remains: what do we do? This article advocates a controversial solution which advocates for judges to explicitly account for the degree of difficulty faced by each team in coming to a result. While this intuitively seems quite extreme, this is actually an extension of an already accepted principle, both in the adjudication of debates and other sports. In the debating status-quo, we tell judges to ‘be lenient’ to Opening Oppositions who are faced with squirrels, given their relative disadvantage of having to make up a case on the spot. We also ask judges to weigh the contributions of the closing teams, accounting for their relative advantages of additional preparation time and seeing the issues of the debate play out in front of them. However, we are unwilling to codify the right of a judge to explicitly take into account, and reference in their judgements, the degree of difficulty faced by each team. To the extent we do, we limit it to very specific circumstances. We believe the same principle that underpins these already accepted norms ought to be formalised and extended to all situations – essentially that we reimagine one of the core roles of the judge.

What is the ‘degree of difficulty’ principle, and how would it be used?
At this point you may be asking: what does any of this actually mean for a judge in a round like one we have described? No longer is the winner just the team that most convinced you; it is the team who convinced you most given their likelihood of convincing you at all. This doctrine is a fundamental rethinking of the scope of the judges role, and would require judges to ask themselves a new set of questions when coming to their decision. We would require judges, as a last step in coming to a decision, to adjust results based on the ‘degree of difficulty’. Judges would need to ask themselves: do I believe, given the way the debate unfolded, that this team had an equal opportunity for success? If the answer is no, we would require judges to adjust their result to account for this. This assessment will need to account for the topic itself and any perceived balance or depth issues; things that took place within a round that artificially changed the difficulty of one team’s task; and the judges own preconceived bias.

With regards to accounting for the topic itself and the way the debate plays out, judges would be required to think about any issues regarding the topic and reward teams who made the best arguments they could, even if those things were ultimately defeated within the debate. This will serve to address the situations outlined above.

Assessing judge bias may seem impossible to do, and to some extent that is true. However, we believe that encouraging a judge to think about their own opinions on the issue and be cognizant of those when coming to a decision is the best approach. For example, if a judge is encouraged to acknowledge (rather than hide) their bias, then they can attempt to correct for it. For example, a judge who is very predisposed to believe that liberty is good and government intervention is bad, may penalise a team that argued things they agreed with in a basic superficial way as compared to a team that argued things which they disagree with in a more logical way – this would be independent to some extent of who ultimately convinced them at the end of the day. For example, perhaps it is true that they still believed a world with small government was good, but might reward a team on government for making the best of (what they perceive to be) a bad situation. This is more likely to ensure that the most meritorious teams succeed.

Principally justifying the ‘degree of difficulty” standard.

The underlying principle of this policy is to account for substantial inequality that occurs in a debate – even though all four teams, on paper, have the same chances of success. In defending this principle, we would draw an analogy to the underlying principle of Affirmative Action policies or redistributive taxation – the idea that merit can only truly be understood and rewarded when all begin from the same point. Those who support AA policies would make the argument that right now ‘merit’ based entrance requirements, for example to universities, do not really reflect the true merit of the individuals, given that one group begun from a disadvantaged position. It is our opinion that the way the role of the adjudicator is currently constructed limits their capacity to account for the relative difficulty in the starting positions of each team within a debate, and therefore prevents fair results from being achieved.

Difficulty is fundamental consideration in fairness. Consider a second analogy – a diving contest. Each dive that a contestant undertakes is given a difficulty rating and then the ultimate score that is awarded accounts for the degree of success at that dive, meaning that someone who poorly executes a very difficult dive may get a similar score to someone who executes a very easy dive quite well. The only real difference between this situation and the aforementioned ones is that these divers get to make the choice to attempt a more difficult manoeuvre, whereas the relative difficulty a team faces is out of their hands in debates. If anything, that suggests the introduction of a degree of difficulty criteria is even more important in the realm of debate than anywhere else in order to maintain the integrity of the competition.


Part Five: There are legitimate criticisms of this approach, but on balance, it will ensure greater fairness in results

The most fundamental criticism of this way of thinking is the idea that it jeopardises the key aim of debate, which we all know is the elusive ‘persuasion’. If the point of the debate is to convince the judge that they should agree with your side and want to do or not do the thing you say, surely failure to do that means you have lost. To some extent that is true. However, we would suggest that winning a race when you had a 50 metre head start is not truly winning at all. More than that, we would of course suggest caution with the use of this principle – the degree of difficulty should only be one of many considerations a panel takes into account, and of course they should weigh it against whom they ultimately thought was the most persuasive in the round. However, the alternative is that they are precluded from accounting for it at all. For us this is a far greater concern.

Another legitimate grievance with this approach is that it is highly subjective and essentially means that the beliefs of the individual judge are determinative in the result of the debate. We acknowledge that, but would suggest a couple of important caveats. Firstly, British Parliamentary has the benefit of consensus adjudication. What that means is that issues of fairness and assessments of the degree of difficulty would be discussed and fought over by the panel, in the same way as all other aspects of the debate. If one person on the panel thought a topic was very Government weighted, one thought it lent to the Opposition and the other thought it was balanced, it would be unlikely that any degree of difficulty consideration would come into play. However, in the circumstance when the entire panel agrees that a topic is near impossible from one side, should they not have the right to account for that in their decision?

Secondly, all adjudication is subjective and that is a reality we ought to embrace rather than ignore. The particular opinions of the judge about the arguments, style and rules are decisive in results right now. What this approach does is to use that subjectivity in a more constructive way, by asking that judges embrace their preconceived bias and make an attempt to correct for it. We would argue that the consistency is found, counterintuitively, in the realisation that there is no consistency.

For those who fear overcorrection (for example that it will reverse the bias and make it basically impossible to win from the intuitive side of a motion), we would suggest again that it ought only be one factor judges consider. We would also question whether it is likely that a judge is going to go so far as to say that they cannot be convinced by the side they are naturally more convinced by. But again, we must compare with the alternative, which is a world where we make no effort to correct for the difficulty some teams face in winning from a particular side or position in front of a certain judge.

Finally, people may sensibly argue that all of these considerations (regarding motion balance, and the application of the knifing rule etc) should remain in the hands of the adjudication core. After all, these people are appointed due to their achievements, experience and respect, and we ought to trust them, more than any random judge, to fairly resolve these issues. Of course we will not question that adjudication cores do a generally fantastic job, and are certainly made up of incredibly qualified people. However, we would point out that there are some things that are structurally problematic about the way adjudication cores function, as discussed earlier. Beyond that, we would say that an adjudication core can never really account for the individual circumstances of any debate, nor the particular biases of any given judge. As such, we believe that the greatest amount of flexibility ought to be given to judges to deliver a fair decision in the context of the round they were entrusted to adjudicate.

Part Six/Conclusion: Setting an agenda for adjudication

We accept that many of the suggestions in this radical are both radical and controversial. Rather than discounting them out hand, we hope that readers will consider some of the problems we have outlined – problems we believe are widely recognised – and whether the ‘degree of difficulty’ principle is an acceptable and an effective solution to them. We contend that current strategies, at best, limit the regularity of unfair outcomes, rather than establishing a solution to them in their entirety. By no means do we argue that this paradigm shift is without fault or risk, but we believe it to be a necessary step in ensuring the integrity of the activity of competitive debate. Considering the starting position of teams within a debate, and their capacities to achieve a successful result is something we ought to allow judges to do in pursuit of creating a more enjoyable and rewarding experience for everyone.