Results Analysis

At the end of an event, I can’t help but wonder “what just happened”?  Lot of games were played, lots of scores were tabulated, the final results by-player were posted, and some awards were given out.  But as the Tournament Organizer I’d like to know to what happened in a broader sense than just by-player.  Were the scenarios fair, or was one round lop-sided?  Were the bonus objectives too hard or too easy?  Would the results have been significantly different if I’d done the scoring differently?  There are a lot of variables that make it hard to conclusively answer ‘what if’ questions.  This is also a pretty small sample to draw conclusions from, only 100 games.  But you have to start somewhere.  So this should be considered more like food for thought, and an encouragement for other TOs to perform and post a similar analysis, or maybe just my own notes/thinking out loud.

 

Bonus Objectives

I’ll start with something kinda unique to this event, secret bonus objectives.  At the beginning of the game, each player selects a 2 point bonus objective from a list, keeps it secret from their opponent, and can only select each one once.  The goal is to add some asymmetry to the games, provide an additional strategic element, and give players something to keep trying to accomplish even when the game is lost.  So the immediate question to answer is, “how often do players accomplish this objective and lose the game?”

First, the basic numbers.  In 46% of cases, the player accomplished their bonus objective.  That’s almost half of them, which feels about right. You don’t want them too easy or too hard.  But you don’t want that 46% to always be the player who won the game, since then it’s no different than not having the bonus and just making the game scenario worth two more points.  The W/D/L breakdown for accomplishing the objective looks like this:

Win 69%
Draw 30%
Loss 27%

So winning and accomplishing the objective seem to correlate pretty highly, but is not guaranteed, and a little over a quarter of the time the loser still manages to get the bonus.  In most cases, only one of the two players accomplishes the objective, but there were cases where neither accomplished it, and at least one non-draw where both players did.

In terms of the specific objectives, the distribution of how often each was selected is pretty even.  Most were selected about 12% of the time, with one selected 19% and one selected 8%.  The distribution of how often they were accomplished is also pretty even, most falling between 40% and 60%.  There was one notable outlier though. The objective ‘Claim the Field’ was only accomplished once, giving it a 9% success rate.  It’s not clear why that one was such an outlier.  45% of the players who selected it won their games, which is the average win percentage, so it doesn’t seem like it was a throw away choice by players who didn’t think they had a good chance in their upcoming games.  It’s possible that the objective is too difficult and should be changed or replaced.

All in all, the bonus objectives seem to have done their job, with only one potential issue.

 

Scenario Results

I would expect a balanced scenario to result in a distribution of 80-90% wins/losses and 10-20% draws.  For this event I used a Major/Minor scoring method, which I will discuss in more detail later, so I would expect the distribution to be 10-20% draws, 40-50% minor wins/losses, and 40-50% major wins/losses.  The actual distribution across the entire event ended up being 10% draws, 30% minor wins/losses, and 60% major wins/losses.  Those results are skewed towards major victories a little more than I’d like, which makes me wonder if there’s a particular scenario throwing off the curve.  The distribution by scenario looked like this:

Scenario Draw Minor Major
Invade  0  3  7
Occupy  2  3  6
Push  2  4  4
Ransack  0  4  6
Control  1  3  6

I used swiss pairing to determine pairings for the rounds after round 1.  So, in theory, starting in the 3rd round the players should have been facing opponents of relatively equal skill.  I feel like that has to be considered along with these results, as you would expect the first couple of rounds to be skewed towards bigger wins.  The first round is definitely skewed towards major wins, but it’s hard to say if that’s the fault of the scenario.  Half of those matches were also challenges.  Push, in round 3, is the only scenario that resulted in the expected distributionControl, in round 5, fits the average distribution for the entire event, which suggests that the pairings in the last round had settled on what was ‘normal’ in terms of skill levels between opponents for the event.

In 3 of the 5 rounds, there was 1 more major win than expected, and in the first round there were 2.   In rounds 1 and 4 there were no draws.  That suggests that Ransack and Invade might need some adjustments, but I would want to see how playing them in different rounds impacts the results first.

 

Attrition Results

Another way to look at the results is via the attrition adjustments.  You can find the attrition chart on the rules page, but the short version is that there are only 3 points worth of attrition and you needed 1000+ points routed difference to get all 3.

Since there are 4 possibilities (0, +1/-1, +2/-2, +3/-3), its reasonable to expect a distribution of 25% each.  The actual distribution ended up being 0 – 22%, +1/-1 – 18%, +2/-2 – 24%, and +3/-3 – 34%.  So again we skew a little towards the higher end over all.  Broken down by scenario it looks like this:

Scenario 0 +1/-1 +2/-2 +3/-3
Invade  2  0  3  5
Occupy  1  3  2  4
Push  4  2  3  1
Ransack  2  1  2  5
Control  2  3  2  3

By round, the attrition results show some similarities to the scenario results.  Invade, in round 1, and Ransack, in round 4, both skew towards the high end againControl, in round 5, is closest to an even distribution. Push, in round 3, has the largest number of ‘0’ attrition results, and also had the most even W/L/D distribution.

 

Total Scores

Another way to look at the results is via the combined scenario, attrition, and bonus scores.  If the scenario and attrition results seem to skew high in a couple of rounds, what does that make the distribution of total scores look like?

Every possible score, from 2 to 20, occurred at least once during the tournament, with each occurring an average of 5% of the time.  The only notable outliers occurring more than 10% of the time are 2, occurring 15% of the time, and 20, occurring 13% of the time.  That means in 26% of games the winner received a 20, and 30% of the time the loser received a 2. This isn’t too surprising given that we know the other results skew high, but it does mean more than a quarter of the games were one-sided.  If we cluster the total scores into two groups, 12-16 and 17-20, it actually looks less skewed.  In 56% of games, one player received 17-20 points, and in 44% of games one player received 12-16.  By round it looks like this:

Scenario 12-16 17-20
Invade  3  7
Occupy  5  7
Push  5  4
Ransack  5  5
Control  4  5

The numbers by round are actually pretty even for the last 3 roundsRansack looks less skewed than in the previous exercises, but Invade is still very skewed towards the high end.  You might also notice that Occupy has more than 10 results where a player earned 12 or more points.  That is due to two of the players who drew in that round also getting their bonus objectives.  The first two rounds are the most skewed, but those are also the rounds before swiss pairing is supposed to have started pairing opponents of roughly equivalent skill against one another, so I’m not sure if there’s anything to be done about it.  I supposed I could do first round pairings based on the previous or current year’s regional standings, and that might even it out some.

 

Major/Minor Scoring

Instead of doing a straight 15/10/5 for W/D/L, I elected to do Major/Minor scoring system.  Minor victories were worth 12/8 and major victories were worth 15/5 as usual.  Each scenario had a major victory condition which was basically ‘win the scenario by X scenario points’, and you can review them on the rules page. The big question is ‘what is the actual impact of this added complexity?’  In theory, it should reward players who play the scenarios better and provide more accurate pairings by creating a more varied score distribution.  But what difference does it make in the final results?

In order to find an answer to this question, we can go back and see what the final scores would have been if the minor wins/losses were worth 15/5 like the major wins/losses instead of 12/8.  The problem with this method is that a different scoring system would likely have resulted in some number of different pairings, which might have resulted in different scores.  So it’s not perfect, but I think it’s pretty good.

80% of players would have had a different battle score if I’d used 15/5, but 70% of them would only be different by 3 points.  15% of players would end up with the same score due to a minor win and minor loss offsetting each other.  In terms of awards, different players would have won both Best General and the Counter Charger award.  Final placements would have changed as follows:

Name Original Place Adjusted Place
 Alex Chaves 1  1
 GeorgeO’Connell 2  2
 Steven Bassler 3  5
 Ray Shields 4  3
 Joey Greek 5  4
 Bill Goodrick 6  6
 Sean McCormally 7  9
 Chris Fisher 8  7
 Caelen McMillan 9  8
 Bart Koehler 10  12
 Sean Moore 11  13
 Ray Weiandt 12  15
 Mike Austin 13  11
 Scott Flowers 14  10
 James Crawford 15  14
 Tony Nelson 16  17
 Thomas Strother 17  16
 Ken Stubbs 18  18
 Austin Hunt 19  19
 Richard Knott 20  20

The top 2 and bottom 3 don’t change, and neither does 6th place.  Otherwise, 70% of players land in a different place, but only 30% move more than 1 place and only 1 player moves more than 2 places.  Given that final placements are what we use for regional rankings, and 2 of the 4 awards that use battle points or overall points would have gone to different people, I think it’s fair to say that using a major/minor style scoring system over just W/L/D has a significant impact on the end results of the tournament and could result in different people qualifying for regional the Masters team if all of a region’s tournaments adopted something similar.

Edit:

I got an interesting follow up question regarding whether the 1st round challenges were more or less balanced than the random pairings. The short answer is that the challenges were actually slightly less balanced than the random pairings.  It also brought up an interesting question regarding how predictive the results of each round are in regards to the final placements, and do the final placement deltas between opponents go down continuously over the course of the event.


If you ignore the draws, you get something like this:
In the first two rounds, all but 1 game predicts the final placements of the two opponents. In the 3rd and 4th rounds, 3 games don’t predict the final placements. The final round, of course, predicts the final placements perfectly. That means the results of the rounds with supposedly more balanced pairings are less reliable at predicting final placements, except for the last round.

The final placement deltas look like this:
Round 1 – 9.7
Round 2 – 3.9
Round 3 – 2.8
Round 4 – 4
Round 5 – 4.4

So that means, on average, the winner of any one round will finish 5 places away from their opponent, and 84% of the time they will place higher. The biggest delta is in the first round and the smallest delta is actually in the 3rd and not the final round. That agrees with the other results that suggest round 1 was the most unbalanced and round 3 was the most balanced.

It would be interesting to see whether the deltas would continue their upward trend in a 6th round, or turn back down.