Introduction to Minimax Algorithm with a Java Implementation – 用Java实现的最小值算法介绍

最后修改: 2017年 7月 11日

中文/混合/英文(键盘快捷键:t)

1. Overview

1.概述

In this article, we’re going to discuss Minimax algorithm and its applications in AI. As it’s a game theory algorithm, we’ll implement a simple game using it.

在这篇文章中,我们将讨论Minimax算法以及它在人工智能中的应用。由于它是一种博弈论算法,我们将用它实现一个简单的游戏。

We’ll also discuss the advantages of using the algorithm and see how it can be improved.

我们还将讨论使用该算法的优势,并看看如何改进它。

2. Introduction

2.简介

Minimax is a decision-making algorithm, typically used in a turn-based, two player games. The goal of the algorithm is to find the optimal next move.

Minimax是一种决策算法,通常用于基于回合制的双人游戏。该算法的目标是找到最佳的下一步棋。

In the algorithm, one player is called the maximizer, and the other player is a minimizer. If we assign an evaluation score to the game board, one player tries to choose a game state with the maximum score, while the other chooses a state with the minimum score.

在该算法中,一个玩家被称为最大化者,而另一个玩家是最小化者。如果我们给游戏板分配一个评价分数,一个玩家就会试图选择一个具有最大分数的游戏状态,而另一个玩家则选择一个具有最小分数的状态。

In other words, the maximizer works to get the highest score, while the minimizer tries get the lowest score by trying to counter moves.

换句话说,近似者致力于获得最高分,而最小化者则试图通过反击行动获得最低分

It is based on the zero-sum game concept. In a zero-sum game, the total utility score is divided among the players. An increase in one player’s score results into the decrease in another player’s score. So, the total score is always zero. For one player to win, the other one has to lose. Examples of such games are chess, poker, checkers, tic-tac-toe.

它是基于零和游戏的概念。在零和博弈中,总的效用得分在玩家之间进行分配。一个玩家分数的增加会导致另一个玩家的分数减少。因此,总分总是为零。一个玩家要赢,另一个就得输。这种游戏的例子有国际象棋、扑克、跳棋、井字棋。

An interesting fact- in 1997, IBM’s chess-playing computer Deep Blue (built with Minimax) defeated Garry Kasparov (the world champion in chess).

一个有趣的事实是,1997年,IBM的国际象棋计算机深蓝(用Minimax构建)击败了加里-卡斯帕罗夫(国际象棋世界冠军)。

3. Minimax Algorithm

3.最小化算法

Our goal is to find the best move for the player. To do so, we can just choose the node with best evaluation score. To make the process smarter, we can also look ahead and evaluate potential opponent’s moves.

我们的目标是为棋手找到最佳棋步。要做到这一点,我们可以直接选择评价分数最高的节点。为了使这一过程更加智能,我们还可以向前看,评估潜在对手的棋。

For each move, we can look ahead as many moves as our computing power allows. The algorithm assumes that the opponent is playing optimally.

对于每一步棋,我们可以在计算能力允许的范围内提前看多少步。该算法假定对手是以最佳方式下棋。

Technically, we start with the root node and choose the best possible node. We evaluate nodes based on their evaluation scores. In our case, evaluation function can assign scores to only result nodes (leaves). Therefore, we recursively reach leaves with scores and back propagate the scores.

从技术上讲,我们从根节点开始,选择可能的最佳节点。我们根据节点的评价分数来评价节点。在我们的案例中,评价函数只能给结果节点(叶子)分配分数。因此,我们递归地到达有分数的叶子,并将分数反向传播。

Consider the below game tree:

考虑一下下面的游戏树。

minimax

Maximizer starts with the root node and chooses the move with the maximum score. Unfortunately, only leaves have evaluation scores with them, and hence the algorithm has to reach leaf nodes recursively. In the given game tree, currently it’s the minimizer’s turn to choose a move from the leaf nodes, so the nodes with minimum scores (here, node 3 and 4) will get selected. It keeps picking the best nodes similarly, till it reaches the root node.

Maximizer 从根节点开始,选择具有最大分数的移动。不幸的是,只有叶子才有评价分数,因此,算法必须递归地到达叶子节点。在给定的博弈树中,目前轮到最小化器从叶子节点选择一步棋,因此分数最低的节点(这里是节点3和4)将被选中。它一直以类似的方式选择最佳节点,直到到达根节点。

Now, let’s formally define steps of the algorithm:

现在,让我们正式定义该算法的步骤。

  1. Construct the complete game tree
  2. Evaluate scores for leaves using the evaluation function
  3. Back-up scores from leaves to root, considering the player type:
    • For max player, select the child with the maximum score
    • For min player, select the child with the minimum score
  4. At the root node, choose the node with max value and perform the corresponding move

4. Implementation

4.实施

Now, let’s implement a game.

现在,让我们来实现一个游戏。

In the game, we have a heap with n number of bones. Both players have to pick up 1,2 or 3 bones in their turn. A player who can’t take any bones loses the game. Each player plays optimally. Given the value of n, let’s write an AI.

在游戏中,我们有一个n个骨头的堆。双方都必须在自己的回合中拿起1、2或3根骨头。不能拿任何骨头的玩家就会输掉游戏。每个玩家都以最佳方式进行游戏。鉴于n的值,我们来写一个人工智能。

To define the rules of the game, we will implement GameOfBones class:

为了定义游戏规则,我们将实现GameOfBones类。

class GameOfBones {
    static List<Integer> getPossibleStates(int noOfBonesInHeap) {
        return IntStream.rangeClosed(1, 3).boxed()
          .map(i -> noOfBonesInHeap - i)
          .filter(newHeapCount -> newHeapCount >= 0)
          .collect(Collectors.toList());
    }
}

Furthermore, we also need the implementation for Node and Tree classes as well:

此外,我们还需要实现Node Tree 类。

public class Node {
    int noOfBones;
    boolean isMaxPlayer;
    int score;
    List<Node> children;
    // setters and getters
}
public class Tree {
    Node root;
    // setters and getters
}

Now we’ll implement the algorithm. It requires a game tree to look ahead and find the best move. Let’s implement that:

现在我们要实现这个算法。它需要一个博弈树来向前看并找到最佳棋步。让我们来实现它。

public class MiniMax {
    Tree tree;

    public void constructTree(int noOfBones) {
        tree = new Tree();
        Node root = new Node(noOfBones, true);
        tree.setRoot(root);
        constructTree(root);
    }

    private void constructTree(Node parentNode) {
        List<Integer> listofPossibleHeaps 
          = GameOfBones.getPossibleStates(parentNode.getNoOfBones());
        boolean isChildMaxPlayer = !parentNode.isMaxPlayer();
        listofPossibleHeaps.forEach(n -> {
            Node newNode = new Node(n, isChildMaxPlayer);
            parentNode.addChild(newNode);
            if (newNode.getNoOfBones() > 0) {
                constructTree(newNode);
            }
        });
    }
}

Now, we’ll implement the checkWin method which will simulate a play out, by selecting optimal moves for both players. It sets the score to:

现在,我们将实现checkWin方法,该方法将模拟一场比赛,为双方选择最佳棋步。它将分数设置为。

  • +1, if maximizer wins
  • -1, if minimizer wins

The checkWin will return true if the first player (in our case – maximizer) wins:

如果第一个玩家(在我们的例子中是最大化者)获胜,checkWin将返回true。

public boolean checkWin() {
    Node root = tree.getRoot();
    checkWin(root);
    return root.getScore() == 1;
}

private void checkWin(Node node) {
    List<Node> children = node.getChildren();
    boolean isMaxPlayer = node.isMaxPlayer();
    children.forEach(child -> {
        if (child.getNoOfBones() == 0) {
            child.setScore(isMaxPlayer ? 1 : -1);
        } else {
            checkWin(child);
        }
    });
    Node bestChild = findBestChild(isMaxPlayer, children);
    node.setScore(bestChild.getScore());
}

Here, the findBestChild method finds the node with the maximum score if a player is a maximizer. Otherwise, it returns the child with the minimum score:

在这里,findBestChild方法找到了具有最大分数的节点,如果玩家是一个最大化者。否则,它将返回具有最小分数的子节点。

private Node findBestChild(boolean isMaxPlayer, List<Node> children) {
    Comparator<Node> byScoreComparator = Comparator.comparing(Node::getScore);
    return children.stream()
      .max(isMaxPlayer ? byScoreComparator : byScoreComparator.reversed())
      .orElseThrow(NoSuchElementException::new);
}

Finally, let’s implement a test case with some values of n (the number of bones in a heap):

最后,让我们用一些n(堆中的骨头数量)的值实现一个测试案例。

@Test
public void givenMiniMax_whenCheckWin_thenComputeOptimal() {
    miniMax.constructTree(6);
    boolean result = miniMax.checkWin();
 
    assertTrue(result);
 
    miniMax.constructTree(8);
    result = miniMax.checkWin();
 
    assertFalse(result);
}

5. Improvement

5.改进

For most of the problems, it is not feasible to construct an entire game tree. In practice, we can develop a partial tree (construct the tree till a predefined number of levels only).

对于大多数问题,构建整个游戏树是不可行的。在实践中,我们可以开发一个部分树(只构建到预定数量的级别的树)

Then, we will have to implement an evaluation function, which should be able to decide how good the current state is, for the player.

然后,我们将不得不实现一个评价函数,它应该能够决定当前状态对玩家来说有多好。

Even if we don’t build complete game trees, it can be time-consuming to compute moves for games with high branching factor.

即使我们不建立完整的游戏树,对于分支系数高的游戏,计算动作也会很费时。

Fortunately, there is an option to find the optimal move, without exploring every node of the game tree. We can skip some branches by following some rules, and it won’t affect the final result. This process is called pruning. Alpha–beta pruning is a prevalent variant of minimax algorithm.

幸运的是,有一个选项可以找到最优棋步,而无需探索对局树的每一个节点。我们可以按照一些规则跳过一些分支,而这不会影响最终的结果。这个过程被称为剪枝Alpha-beta修剪是最小化算法的一个盛行的变体。

6. Conclusion

6.结论

Minimax algorithm is one of the most popular algorithms for computer board games. It is widely applied in turn based games. It can be a good choice when players have complete information about the game.

Minimax算法是计算机棋盘游戏中最流行的算法之一。它被广泛地应用于回合制游戏。当玩家拥有关于游戏的完整信息时,它可以成为一个很好的选择。

It may not be the best choice for the games with exceptionally high branching factor (e.g. game of GO). Nonetheless, given a proper implementation, it can be a pretty smart AI.

对于分支因素特别高的游戏(如GO游戏),它可能不是最佳选择。尽管如此,如果有适当的实施,它可以成为一个相当聪明的人工智能。

As always, the complete code for the algorithm can be found over on GitHub.

一如既往,该算法的完整代码可以在GitHub上找到over