How to Design a Game With a Purpose

This post is the next in series of my literature study of the Human Computation and Games With a Purpose field. Following is the list of previous posts I wrote on this:

Human Computation and Games with a Purpose

In this current post, we’ll discuss the general guidelines of building a GWAP for some problems. As the authors point out in the paper, these are in no-way complete guidelines, but it’s a good starting point to tackle at least some computational problems utilizing Human Computation through these interfaces.

Recap: What is a Game with a Purpose (GWAP)?

You should read the previous post where I have described the related terms. For the sake of completeness I’ll give a quick definition of a GWAP. It is a game (hopefully enjoyable) designed to generate useful data for the machines as a side effect of playing it. GWAPs are designed for the purpose of solving tasks that are (quite) difficult for computers, but really easy for humans to solve. This computation done by humans is also called Human-based Computation or Human Computation.

All of the below listed factors give a sense of scale of problems we can solve if we are able to build effective and enjoyable GWAPs.

Increasing proportion of world’s population has access to the internet;
Certain tasks are impossible for computers, but easy for humans;
People spend lots of time playing games on their devices;
Widespread use of smartphones has increased the surface area furthermore.

Featured Literature

Luis von Ahn and Laura Dabbish. Designing Games With A Purpose. Communications of the ACM, August 2008. [pdf]

Matter

Previous Efforts at Employing Human Computation

Seems like, the use of Human Computation to get something done is not new. Lets first discuss the previous efforts at this problem and their success.

Distributed Collaboration by Individuals

When we have tasks/problems which are difficult, time consuming and nearly impossible for a single person (or a small group) to solve, collaboration is one way to get rid of that problem. Some quite successful examples of such tasks-

Open-Source Software Development

Yeah, this is a big one. We are quite dependent on open-source software even if you dont exactly work in tech. The motivation for individuals for doing this distributed collaboration can be attributed mainly to altruism. These days personal brand and monetary gains are also motivating factors.
Wikipedia (and other sister projects)

Another big one. The project where individual contributors are creating an encyclopedia of the world. A lot of articles on variety of things in a variety of languages. What’s the motivation here? Again, altruism. I am putting all other reasons like saving the culture, researching something, knowledge archival, etc, under altruism.
Stack Exchange Network

Whoa, that’s another big one. The website where the community has created a large repository of issues faced by people and possible set of solutions for those issues. Motivations here are two-fold: the asker is getting her questions answered and the giver is driven by altruism and at the same time increasing her personal brand with the profile score. SE has also deployed a lot of gamification features which encourages you to contribute and do more Human Computation.
Amazon Mechanical Turk

This is a big one for rich researchers. They can put their tasks out there to be annotated or solved by a host of workers registered on the platform. Workers get a money out of each task solved. Motivation here is clearly the pecuniary gains.

Open Mind initiative

Collaborative framework to build intelligent software by using human skills to train computers. Volunteers participate by providing answers to questions computer can’t answer (ex., what is in this image?). From the looks of the the website the project seems defunct now.

Drawbacks when compared to the GWAP approach:

Unpaid volunteers donating their time;
No guarantee that the information given by volunteers is correct.

Making Work Fun

This work focusses on the point maintained by Human Computer Interaction (HCI) researchers: importance of enjoyment and fun in user interfaces. This also includes the gamification of learning activities for children. Some research efforts in this direction:

StyleCam: Game like interaction with the software. [paper]
psDooM: Turning user interface into a game. A first-person-shooter style interface for system-administrator-related tasks. [paper]

The research shows that this concept works efficiently when there’s a tight interplay between the game interaction and the task to be finished.

GWAP Design Considerations

Enjoyable Game Play

GWAPs do not rely on altruism, personal branding or financial incentives. They instead rely on the human desire to be entertained. People play not because they are personally interested in solving an instance of a computational problem, but because they wish to be entertained. Keeping aside the philosophy of a game being fun or enjoyable, we should be able to measure if the game is successful.

Useful and Correct Computation

Primary purpose of GWAPs is to get reliable data for for any computation problem it’s based on. A GWAP should encourage players to correctly perform the necessary steps to solve the computational problem. It should also involve a probabilistic guarantee that the game’s output is correct, even if players do not want it to be correct.

GWAP Design Templates

Output Agreement Games

Initial Setup
- Two strangers randomly chosen from all potential players;
- In each round, both are given the same input and must produce outputs based on the input.
Rules
- game instructions indicate that players should try to produce the same output as their partners;
- players cannot see one another’s outputs;
- players can’t communicate with each other.
Winning Condition
- Both players have to produce the same output;
- They need not produce it at the same time, but must produce it at some point while the input is displayed on screen.

Here’s the visual display of the template reproduced from the paper linked at the top.

output-agreement-games

How/Why it works?

Since players can’t communicate and know nothing about each other, the most easiest and most intuitive way for both to produce the same output is by entering something about the only thing common between them, that is, the input.

How/Why is it enjoyable?

Trying to agree on the same output with a partner is an enjoyable experience
Game doesn’t ask the players to enter the correct output for a given input; players are encouraged to think like each other which encourages the feeling of connection with your partner during a game session.

How/Why is the computation correct?

When players provide the same output, it partially verifies that the output is correct as it comes from two largely independent sources

Inversion Problem Games

Initial Setup
- Two strangers randomly chosen from all potential players;
- In each round, one player is describer and the other player is guesser.
Rules
- Describer gets an input and based on that describer produces outputs that are sent to the guesser;
- The output from the describer should help the guesser produce the original input.
Winning Condition
- The guesser produces the input that was originally given to the describer.

Here’s the visual display of the template reproduced from the paper linked at the top.

output-agreement-games

How/Why it works?

Partners are successful only when the describer provides enough outputs for the guesser to guess the original input.

How/Why is it enjoyable?

Having one player guess the input while the other describes it is an enjoyable experience (something similar to popular children’s game “20 questions”).
Adding transparency for make it more enjoyable;
- Displaying partner guesses to the describers and allowing them to indicate whether each guess is hot or cold;
- Increases social connection between the players;
- Doesn’t compromise the output correctness.
Alternation to handle asymmetric nature of game;
- Each player in the pair performs a different task;
- One role might be more enjoyable (faster-paced or involves more interaction);
- To maintain the balance, switch player roles after each round: guesser becomes the describer and describer becomes the guesser.

How/Why is the computation correct?

Game structure encourages players to enter correct information;
If outputs are incorrect or incomplete, the guesser will fail to make the right guess.

Input Agreement Games

Initial Setup
- Two strangers randomly chosen from all potential players;
- In each round, both players are given inputs that are known by the game (but not by the players) to be the same or different.
Rules
- Players are instructed to produce outputs describing their input, so their partners are able to assess whether their inputs are same or different;
- Players see only each other’s outputs.
Winning Condition
- Both players correctly determine whether they have been given the same or different inputs.

Here’s the visual display of the template reproduced from the paper linked at the top.

output-agreement-games

How/Why it works?

Because players want to achieve winning condition, they each want their partner to be able to determine if their inputs are the same;
This means that it’s in their own best interest to enter accurate outputs that appropriately describe their individual inputs.

How/Why is the computation correct?

Scoring strongly penalizes incorrect guesses to discourage players from randomly guessing whether inputs are the same.
A way of implementing this while maintaining positive scoring system: scores or combos of streaks of correct answers

GWAP Design: Increasing Player Enjoyment

From the literature on motivation in psychology and organizational behavior, goals that are both well-specified and challenging lead to higher levels of effort and task performance than goals that are too easy or vague. Research on game-design principles also tells us to introduce challenge in the games and introducing challenge will translate into different game features.

Timed Response

Complete a set number of problem instances within an assigned time limit.
Time limit establishes an explicit goal that is not trivial for players to achieve is game is calibrated properly (number of tasks and time limit).
Time limit and time remaining should be displayed throughout the game.

Score Keeping

Use of points increases motivation and provides a clear connection among effort in the game, performance (achieving the winning condition), and outcomes.

Player Skill Levels

Players are given skill levels or ranks.
Following each game session, players are shown their current skill level and the number of points needed to reach the next level.
From anecdotal experience, skill-level information strongly influenced player motivation and behavior;
Also from anecdotal experience, many players continue playing just to reach a new rank.

High score lists or Leaderboards

Players with the highest number of points over a certain period of time;
Hourly high-score list;
Daily high-score list;
All-time high-score list;
These multi-level goals, varying in difficulty, provide strong, positive motivation for extended game play.

Randomness

Random input selection brings,
- Varying difficulty, keeping the game interesting and engaging for expert and novice players alike.
- Uncertainty about whether all inputs will be completed within the time limit
Random partner assignment
- Ensures the uniqueness of each game session
- From anecdotal experience, during each game session, players develop a sense of their partners’ skill and way of playing (sense of connection), which is great motivation for repeated play.

GWAP Design: Correctness Mechanisms

Ensure output correctness (#correctness)
Counter player collusion (#collusion)

Random Matching

Randomly matched players helps maintain the correctness in two ways:

Two known players can’t agree ahead of time on any cheating strategy as the probability is very low of them matching with each other. (#collusion)
Probability of two or more cheaters using the same strategy being paired together will also be low. (#collusion)

Player Testing

Randomly present players those inputs for which all possible correct outputs are already known (test inputs); if output produced by a particular player does not match with the known values then something is fishy. (#correctness)
With enough number of these test inputs presented to the players, we can guarantee with high probability that the output is correct. To illustrate-
- Assume half of the inputs given to a player are test inputs.
- The probability that a new output by the player is correct, given that the player is correct on all the test inputs is at least 50%;
- This probability can be increased through repetition.
Similar technique is also used by Stack Exchange in their review queues. Read: What are review tests (audits) and how do they work?

Repetition

We can ensure the correctness of the output with high probability if we consider the output to be correct only when a certain number of players have said so. (#correctness)
Example: Consider an output-agreement game;
- for a given input, the game considers an output to be correct after n pairs have entered it
- Game knows that each pairs out of these n pairs, will enter a correct output with a probability of at least 50% (from player testing)
- The output is correct with probability of at least \((1-\frac{1}{2^n})\)
Stack Overflow uses this as well. For each review, it asks some n number of reviewers to review and then the takes next action.

Taboo Outputs

For those inputs which can have multiple outputs, we have to ensure that the output space is covered sufficiently;
Use of taboo words or off-limits outputs provides some guarantee that a larger proportion of all possible outputs will be entered by all players.
Players are not allowed to enter the outputs present in the taboo words
taboo outputs are presented in order to account for potential output-priming effects (in which the particular taboo outputs shown to the players influence the guesses they enter)

GWAP Design: One Player

Paired game play makes GWAPs social in nature;
Players are able to validate each other’s computation;
Although, how to handle logistical issues in dyadic gameplay?
- What if odd number of people are currently online?
- What if a partner is facing network issues is not working properly?

Solution: Pre-recorded Games

When two people are playing, the game should simply record every action they make, along with the relative timing of each action.
In a single-player game pair a single player with the pre-recorded set of actions.
This technique is easy to implement for Input and Output Agreement games;
For inversion-problem games customised techniques are required because one of the players (the guesser) must dynamically respond to the other player’s (the describer) actions.

GWAP Design: 2+ Players GWAPs

Multiplayer versions are competitive.
Output-Agreement games: Modifying the winning condition such that the first two players who agree on the output are the winners of the round (and granted a higher number of points than the non-winners).
Inversion-Problem games: Substituting an individual guesser with an arbitrary number of players in the role of guesser, all racing to be first to correctly guess the input (winning condition).
Drawback: More players are working on the same computation which is a waste of computation cycles;
- Although games can be designed in a way that also utilizes the repetition technique (discussed in the correctness mechanisms) within the same round. This way the waste would be less there.

GWAP Design: Evaluation

Given that two different GWAPs solve the same problem, which one is best? Every GWAP associated with a computational problem, can be thought of as an algorithm: give an input and get an output. We can’t measure our GWAP algorithm using big-O type metric as it’s not clear what an atomic step in any GWAP is. On top of it, just a normal output or running metric is not enough. We’ll also need to quantify enjoyability factor as well.

Game Efficiency: Throughput

Average number of problem instances solved, or input-output mappings performed, per human-hour
A reasonable lengthy time period should be considered for taking the average to accound for learning curves and variations in player skill (people get faster at game play over time).
Higher throughput should be preferred

Quantifying Enjoyability: Average Lifetime Play (ALP)

Since it is difficult to quantify enjoyability of any game we’ll take a proxy
Overall amount of time the game is played by each player averaged across all people who have played it.

One Metric: Expected Contribution

provides a more accurate direct assessment of how much people play the game and, in turn, how useful the game is for computational purposes; more effective than self-report questionnaire measures.
Expected contribution indicates the average number of problem instances a single human player can be expected to solve by playing a particular game;
Expected Contribution = Throughput * ALP

Conclusion

First general method of integrating a computational problem with a game.
Authors recognize that this might not be an exhaustive list of templates out there.
In the current templates players are rewarded for thinking like other players which uses similarity as a way to ensure output correctness.
These approaches may not be proper/optimal/useful for tasks that require creativity, diverse viewpoints and perspectives
There might be other problems which fall outside GWAP space.
Current templates solve those problems which can be divided into subtasks thus making appealing because of its bite-sized nature.

Some questions I got after reading this paper:

What kind of problems need to be (or can be) solved this way? This may help us in determining if there is even a need to look for other templates. May be these templates are enough?
How to break problems into small chunks to be created into games? This is a crucial step while setting up the narrative of the games.
Can we get more ideas by studying the games played by the children? I think we all play some modified versions of those games added with the adult elements.
What do current game-designers think about this concept? Since they deal with so many players and so many game elements, they would have some new perspective to bring on the GWAPs. Sadly I dont know any game developer.