Monte Carlo Tree Search for Tic-Tac-Toe Game in Java – 蒙特卡洛树形搜索在Java中的井字游戏

最后修改: 2017年 6月 24日

中文/混合/英文(键盘快捷键:t)

1. Overview

1.概述

In this article, we’re going to explore the Monte Carlo Tree Search (MCTS) algorithm and its applications.

在这篇文章中,我们将探讨蒙特卡洛树搜索(MCTS)算法及其应用。

We’ll look at its phases in detail by implementing the game of Tic-Tac-Toe in Java. We’ll design a general solution which could be used in many other practical applications, with minimal changes.

我们将通过在Java中实现Tic-Tac-Toe游戏来详细了解它的阶段。我们将设计一个通用的解决方案,它可以用于许多其他的实际应用,只需最小的改动。

2. Introduction

2.简介

Simply put, Monte Carlo tree search is a probabilistic search algorithm. It’s a unique decision-making algorithm because of its efficiency in open-ended environments with an enormous amount of possibilities.

简单地说,蒙特卡洛树搜索是一种概率性搜索算法。它是一种独特的决策算法,因为它在具有大量可能性的开放式环境中效率很高。

If you are already familiar with game theory algorithms like Minimax, it requires a function to evaluate the current state, and it has to compute many levels in the game tree to find the optimal move.

如果你已经熟悉像Minimax这样的博弈论算法,它需要一个函数来评估当前的状态,它必须在博弈树中计算许多层以找到最优的动作。

Unfortunately, it is not feasible to do so in a game like Go in which there is high branching factor (resulting in millions of possibilities as the height of the tree increases), and it’s difficult to write a good evaluation function to compute how good the current state is.

不幸的是,在像围棋这样的游戏中,这样做是不可行的,因为其中有很高的分支因素(随着树的高度增加,导致数百万种可能性),而且很难编写一个好的评估函数来计算当前状态有多好。

Monte Carlo tree search applies Monte Carlo method to the game tree search. As it is based on random sampling of game states, it does not need to brute force its way out of each possibility. Also, it does not necessarily require us to write an evaluation or good heuristic functions.

蒙特卡洛树搜索将蒙特卡洛方法应用于游戏树搜索。由于它是基于游戏状态的随机抽样,它不需要用蛮力来解决每一种可能性。同时,它也不一定需要我们写一个评估或好的启发式函数。

And, a quick side-note – it revolutionized the world of computer Go. Since March 2016, It has become a prevalent research topic as Google’s AlphaGo (built with MCTS and neural network) beat Lee Sedol (the world champion in Go).

而且,一个简单的题外话–它彻底改变了计算机围棋的世界。自2016年3月以来,由于谷歌的AlphaGo(用MCTS和神经网络构建)击败了李世石(围棋的世界冠军),它已经成为一个普遍的研究课题。

3. Monte Carlo Tree Search Algorithm

3.蒙特卡洛树形搜索算法

Now, let’s explore how the algorithm works. Initially, we’ll build a lookahead tree (game tree) with a root node, and then we’ll keep expanding it with random rollouts. In the process, we’ll maintain visit count and win count for each node.

现在,让我们探讨一下这个算法是如何工作的。最初,我们将建立一个有根节点的lookahead树(游戏树),然后我们将以随机推出的方式不断扩大它。在这个过程中,我们将维护每个节点的访问次数和胜利次数。

In the end, we are going to select the node with most promising statistics.

最后,我们要选择具有最有希望的统计数字的节点。

The algorithm consists of four phases; let’s explore all of them in detail.

该算法由四个阶段组成;让我们详细探讨一下所有这些阶段。

3.1. Selection

3.1.选择

In this initial phase, the algorithm starts with a root node and selects a child node such that it picks the node with maximum win rate. We also want to make sure that each node is given a fair chance.

在这个初始阶段,该算法从一个根节点开始,并选择一个子节点,从而挑选出具有最大赢率的节点。我们还想确保每个节点都有公平的机会。

The idea is to keep selecting optimal child nodes until we reach the leaf node of the tree. A good way to select such a child node is to use UCT (Upper Confidence Bound applied to trees) formula:
formulaIn which

我们的想法是不断选择最优的子节点,直到我们到达树的叶子节点。选择这样一个子节点的好方法是使用UCT(应用于树的置信度上限)公式:
公式在其中

  • wi = number of wins after the i-th move
  • ni = number of simulations after the i-th move
  • c = exploration parameter (theoretically equal to √2)
  • t = total number of simulations for the parent node

The formula ensures that no state will be a victim of starvation and it also plays promising branches more often than their counterparts.

这个公式确保没有一个国家会成为饥饿的受害者,而且它也比其他国家更经常地发挥有前途的分支。

3.2. Expansion

3.2.扩张

When it can no longer apply UCT to find the successor node, it expands the game tree by appending all possible states from the leaf node.

当它不能再应用UCT来寻找继任节点时,它通过追加叶子节点的所有可能的状态来扩展游戏树。

3.3. Simulation

3.3.仿真

After Expansion, the algorithm picks a child node arbitrarily, and it simulates a randomized game from selected node until it reaches the resulting state of the game. If nodes are picked randomly or semi-randomly during the play out, it is called light play out. You can also opt for heavy play out by writing quality heuristics or evaluation functions.

在扩展之后,该算法任意挑选一个子节点,它从选定的节点开始模拟随机游戏,直到达到游戏的结果状态。如果在博弈过程中随机或半随机地挑选节点,则称为轻度博弈。你也可以通过编写高质量的启发式方法或评价函数来选择重度发挥。

3.4. Backpropagation

3.4.逆向传播

This is also known as an update phase. Once the algorithm reaches the end of the game, it evaluates the state to figure out which player has won. It traverses upwards to the root and increments visit score for all visited nodes. It also updates win score for each node if the player for that position has won the playout.

这也被称为更新阶段。一旦算法到达游戏的终点,它就会评估状态,以弄清哪个玩家已经获胜。它向上遍历根部,为所有访问过的节点增加访问分数。如果该位置的玩家赢得了游戏,它还会更新每个节点的赢分。

MCTS keeps repeating these four phases until some fixed number of iterations or some fixed amount of time.

MCTS不断重复这四个阶段,直到某个固定的迭代次数或某个固定的时间。

In this approach, we estimate winning score for each node based on random moves. So higher the number of iterations, more reliable the estimate becomes. The algorithm estimates will be less accurate at the start of a search and keep improving after sufficient amount of time. Again it solely depends on the type of the problem.

在这种方法中,我们根据随机移动来估计每个节点的获胜分数。因此,迭代次数越多,估计就越可靠。在搜索开始时,算法的估计将不太准确,在足够长的时间后会不断改进。这也完全取决于问题的类型。

4. Dry Run

4.干跑

Webp.net-gifmaker-2legend

Here, nodes contain statistics as total visits/win score.

在这里,节点包含的统计数据是总访问量/赢得的分数。

5. Implementation

5.实施

Now, let’s implement a game of Tic-Tac-Toe – using Monte Carlo tree search algorithm.

现在,让我们来实现一个井字游戏–使用蒙特卡洛树搜索算法。

We’ll design a generalized solution for MCTS which can be utilized for many other board games as well. We’ll have a look at most of the code in the article itself.

我们将为MCTS设计一个通用的解决方案,它也可以利用于许多其他的棋盘游戏。我们将在文章本身中看一下大部分的代码。

Although to make the explanation crisp, we may have to skip some minor details (not particularly related to MCTS), but you can always find the complete implementation on GitHub.

虽然为了使解释简单明了,我们可能不得不跳过一些小的细节(与MCTS没有特别的关系),但你总是可以在GitHub上找到完整的实现。

First of all, we need a basic implementation for Tree and Node classes to have a tree search functionality:

首先,我们需要为TreeNode类提供一个基本的实现,以具备树形搜索功能。

public class Node {
    State state;
    Node parent;
    List<Node> childArray;
    // setters and getters
}
public class Tree {
    Node root;
}

As each node will have a particular state of the problem, let’s implement a State class as well:

由于每个节点将有一个特定的问题状态,我们也来实现一个State类。

public class State {
    Board board;
    int playerNo;
    int visitCount;
    double winScore;

    // copy constructor, getters, and setters

    public List<State> getAllPossibleStates() {
        // constructs a list of all possible states from current state
    }
    public void randomPlay() {
        /* get a list of all possible positions on the board and 
           play a random move */
    }
}

Now, let’s implement MonteCarloTreeSearch class, which will be responsible for finding the next best move from the given game position:

现在,让我们来实现MonteCarloTreeSearch类,它将负责从给定的游戏位置寻找下一步最佳棋步。

public class MonteCarloTreeSearch {
    static final int WIN_SCORE = 10;
    int level;
    int opponent;

    public Board findNextMove(Board board, int playerNo) {
        // define an end time which will act as a terminating condition

        opponent = 3 - playerNo;
        Tree tree = new Tree();
        Node rootNode = tree.getRoot();
        rootNode.getState().setBoard(board);
        rootNode.getState().setPlayerNo(opponent);

        while (System.currentTimeMillis() < end) {
            Node promisingNode = selectPromisingNode(rootNode);
            if (promisingNode.getState().getBoard().checkStatus() 
              == Board.IN_PROGRESS) {
                expandNode(promisingNode);
            }
            Node nodeToExplore = promisingNode;
            if (promisingNode.getChildArray().size() > 0) {
                nodeToExplore = promisingNode.getRandomChildNode();
            }
            int playoutResult = simulateRandomPlayout(nodeToExplore);
            backPropogation(nodeToExplore, playoutResult);
        }

        Node winnerNode = rootNode.getChildWithMaxScore();
        tree.setRoot(winnerNode);
        return winnerNode.getState().getBoard();
    }
}

Here, we keep iterating over all of the four phases until the predefined time, and at the end, we get a tree with reliable statistics to make a smart decision.

在这里,我们不断迭代所有的四个阶段,直到预定的时间,最后,我们得到一棵具有可靠统计数据的树,以做出一个明智的决定。

Now, let’s implement methods for all the phases.

现在,让我们为所有的阶段实现方法。

We will start with the selection phase which requires UCT implementation as well:

我们将从选择阶段开始,这也需要UCT的实施。

private Node selectPromisingNode(Node rootNode) {
    Node node = rootNode;
    while (node.getChildArray().size() != 0) {
        node = UCT.findBestNodeWithUCT(node);
    }
    return node;
}
public class UCT {
    public static double uctValue(
      int totalVisit, double nodeWinScore, int nodeVisit) {
        if (nodeVisit == 0) {
            return Integer.MAX_VALUE;
        }
        return ((double) nodeWinScore / (double) nodeVisit) 
          + 1.41 * Math.sqrt(Math.log(totalVisit) / (double) nodeVisit);
    }

    public static Node findBestNodeWithUCT(Node node) {
        int parentVisit = node.getState().getVisitCount();
        return Collections.max(
          node.getChildArray(),
          Comparator.comparing(c -> uctValue(parentVisit, 
            c.getState().getWinScore(), c.getState().getVisitCount())));
    }
}

This phase recommends a leaf node which should be expanded further in the expansion phase:

本阶段推荐一个叶子节点,该节点应在扩展阶段进一步扩展:

private void expandNode(Node node) {
    List<State> possibleStates = node.getState().getAllPossibleStates();
    possibleStates.forEach(state -> {
        Node newNode = new Node(state);
        newNode.setParent(node);
        newNode.getState().setPlayerNo(node.getState().getOpponent());
        node.getChildArray().add(newNode);
    });
}

Next, we write code to pick a random node and simulate a random play out from it. Also, we will have an update function to propagate score and visit count starting from leaf to root:

接下来,我们编写代码来挑选一个随机节点,并从它那里模拟出一个随机播放。另外,我们将有一个update函数来传播分数和访问次数,从叶子开始到根。

private void backPropogation(Node nodeToExplore, int playerNo) {
    Node tempNode = nodeToExplore;
    while (tempNode != null) {
        tempNode.getState().incrementVisit();
        if (tempNode.getState().getPlayerNo() == playerNo) {
            tempNode.getState().addScore(WIN_SCORE);
        }
        tempNode = tempNode.getParent();
    }
}
private int simulateRandomPlayout(Node node) {
    Node tempNode = new Node(node);
    State tempState = tempNode.getState();
    int boardStatus = tempState.getBoard().checkStatus();
    if (boardStatus == opponent) {
        tempNode.getParent().getState().setWinScore(Integer.MIN_VALUE);
        return boardStatus;
    }
    while (boardStatus == Board.IN_PROGRESS) {
        tempState.togglePlayer();
        tempState.randomPlay();
        boardStatus = tempState.getBoard().checkStatus();
    }
    return boardStatus;
}

Now we are done with the implementation of MCTS. All we need is a Tic-Tac-Toe particular Board class implementation. Notice that to play other games with our implementation; We just need to change Board class.

现在我们已经完成了MCTS的实现。我们所需要的只是一个井字游戏的特殊Board类实现。注意,要用我们的实现玩其他游戏;我们只需要改变Board类。

public class Board {
    int[][] boardValues;
    public static final int DEFAULT_BOARD_SIZE = 3;
    public static final int IN_PROGRESS = -1;
    public static final int DRAW = 0;
    public static final int P1 = 1;
    public static final int P2 = 2;
    
    // getters and setters
    public void performMove(int player, Position p) {
        this.totalMoves++;
        boardValues[p.getX()][p.getY()] = player;
    }

    public int checkStatus() {
        /* Evaluate whether the game is won and return winner.
           If it is draw return 0 else return -1 */         
    }

    public List<Position> getEmptyPositions() {
        int size = this.boardValues.length;
        List<Position> emptyPositions = new ArrayList<>();
        for (int i = 0; i < size; i++) {
            for (int j = 0; j < size; j++) {
                if (boardValues[i][j] == 0)
                    emptyPositions.add(new Position(i, j));
            }
        }
        return emptyPositions;
    }
}

We just implemented an AI which can not be beaten in Tic-Tac-Toe. Let’s write a unit case which demonstrates that AI vs. AI will always result in a draw:

我们刚刚实现了一个在井字游戏中不能被击败的人工智能。让我们写一个单元案例,证明人工智能对人工智能的结果总是平局。

@Test
public void givenEmptyBoard_whenSimulateInterAIPlay_thenGameDraw() {
    Board board = new Board();
    int player = Board.P1;
    int totalMoves = Board.DEFAULT_BOARD_SIZE * Board.DEFAULT_BOARD_SIZE;
    for (int i = 0; i < totalMoves; i++) {
        board = mcts.findNextMove(board, player);
        if (board.checkStatus() != -1) {
            break;
        }
        player = 3 - player;
    }
    int winStatus = board.checkStatus();
 
    assertEquals(winStatus, Board.DRAW);
}

6. Advantages

6.优势

  • It does not necessarily require any tactical knowledge about the game
  • A general MCTS implementation can be reused for any number of games with little modification
  • Focuses on nodes with higher chances of winning the game
  • Suitable for problems with high branching factor as it does not waste computations on all possible branches
  • Algorithm is very straightforward to implement
  • Execution can be stopped at any given time, and it will still suggest the next best state computed so far

7. Drawback

7.缺点

If MCTS is used in its basic form without any improvements, it may fail to suggest reasonable moves. It may happen if nodes are not visited adequately which results in inaccurate estimates.

如果MCTS在没有任何改进的情况下以其基本形式使用,它可能无法建议合理的移动。如果节点没有被充分地访问,可能会发生这种情况,从而导致不准确的估计。

However, MCTS can be improved using some techniques. It involves domain specific as well as domain-independent techniques.

然而,MCTS可以使用一些技术进行改进。它涉及特定领域和独立于领域的技术。

In domain specific techniques, simulation stage produces more realistic play outs rather than stochastic simulations. Though it requires knowledge of game specific techniques and rules.

在特定领域的技术中,模拟阶段会产生更真实的游戏结果,而不是随机的模拟。虽然它需要了解游戏的具体技术和规则。

8. Summary

8.总结

At first glance, it’s difficult to trust that an algorithm relying on random choices can lead to smart AI. However, thoughtful implementation of MCTS can indeed provide us a solution which can be used in many games as well as in decision-making problems.

乍一看,我们很难相信一个依靠随机选择的算法能带来智能的人工智能。然而,经过深思熟虑的MCTS的实施确实可以为我们提供一个解决方案,它可以用于许多游戏以及决策问题中。

As always, the complete code for the algorithm can be found over on GitHub.

一如既往,该算法的完整代码可以在GitHub上找到over