1. Overview
1.概述
In this tutorial, we’ll talk about the “Lock wait timeout exceeded” error in MySQL. We’ll discuss what causes this error and some nuances regarding MySQL locks.
在本教程中,我们将讨论MySQL中的 “锁等待超时 “错误。我们将讨论导致这一错误的原因以及关于MySQL锁的一些细微差别。
For the sake of simplicity, we’ll focus on MySQL’s InnoDB engine, as it’s one of the most popular ones. However, we can use the same tests used here to check the behavior of other engines.
为了简单起见,我们将专注于MySQL的InnoDB引擎,因为它是最流行的引擎之一。然而,我们可以使用这里的相同测试来检查其他引擎的行为。
2. Locking in MySQL
2.在MySQL中锁定
A lock is a special object that controls access to a resource. In the case of MySQL, these resources can be tables, rows, or internal data structures.
锁是一种控制对资源访问的特殊对象。在MySQL的情况下,这些资源可以是表、行或内部数据结构。
Another concept to get used to is the lock mode. The lock mode “S” (shared) allows a transaction to read a row. Multiple transactions can acquire the lock of a particular row at the same time.
另一个需要习惯的概念是锁模式。锁模式 “S”(共享)允许一个事务读取一条记录。多个事务可以同时获得某一特定行的锁。
An “X” (exclusive) lock allows a single transaction to acquire it. The transaction can update or delete the row, while the other has to wait until the lock is released so they can acquire it.
一个 “X”(排他性)锁允许一个事务获得它。该事务可以更新或删除该行,而其他事务必须等待锁被释放,以便他们可以获得该行。
MySQL also has intention locks. These are related to tables and indicate the kind of locks a transaction intends to acquire on rows in the table.
MySQL也有意向锁。这些锁与表有关,表明事务打算在表中的行上获得哪种锁。
Locking is crucial to guarantee consistency and reliability in highly-concurrent environments. However, when optimizing for performance, some trade-off has to be made, and in those cases, it’s essential to choose the correct isolation level.
锁定对于保证高并发环境中的一致性和可靠性至关重要。然而,在优化性能时,必须做出一些权衡,在这些情况下,选择正确的隔离级别是至关重要的。
3. Isolation Level
3.隔离水平
MySQL InnoDB offers four transaction isolation levels. They provide different levels of balance between performance, consistency, reliability, and reproducibility. They are, respectively, from the least strict to the most:
MySQL InnoDB提供了四个交易隔离级别。它们在性能、一致性、可靠性和可重复性之间提供了不同程度的平衡。它们分别是,从最不严格到最严格。
- READ UNCOMMITTED: in short, all transactions can read all the changes made by others even if they were not committed
- READ COMMITTED: only committed changes are visible to other transactions
- REPEATABLE READ: the first query defines a snapshot, and it becomes the baseline for that row. Even if another transaction changes the row right after the read, the baseline will always be returned if there are no changes after the first query
- SERIALIZABLE: behaves exactly like the previous one except that if autocommit is disabled, it locks the row during any update or delete, and reads are only allowed after commit
Now that we understand how the different isolation levels work, let’s run some tests to examine locking scenarios. First, in order to keep it short, we’ll run all testing in the default isolation level REPEATABLE READ. However, later we can run the tests for all the other levels.
现在我们了解了不同隔离级别的工作原理,让我们运行一些测试来检查锁定情况。首先,为了简短起见,我们将在默认隔离级别REPEATABLE READ中运行所有测试。然而,以后我们可以运行所有其他级别的测试。
4. Monitoring
4.监测
The tools we’ll see here don’t apply for production use necessarily. Instead, they’ll allow us to understand what’s happening under the hood.
我们将在这里看到的工具不一定适用于生产使用。相反,它们会让我们了解在引擎盖下发生了什么。
The commands will describe how MySQL deals with the transaction and which locks relate to which transactions or how to acquire more data from such transactions. So again, these tools will help us during our tests but may not be applicable in a production environment, or at least not when the error has already occurred.
这些命令将描述MySQL如何处理事务,哪些锁与哪些事务有关,或者如何从这些事务中获取更多的数据。因此,同样,这些工具将在我们的测试中帮助我们,但在生产环境中可能不适用,或者至少在错误已经发生时不适用。
4.1. InnoDB Status
4.1 InnoDB状态
The command SHOW ENGINE INNODB STATUS shows us lots of information about internal structures, objects, and metrics. The output may be truncated depending on the number of available and active connections. However, we’ll only need to look at the transactions section for our use case.
命令SHOW ENGINE INNODB STATUS向我们显示了很多关于内部结构、对象和指标的信息。输出可能会被截断,这取决于可用的和活动的连接数。然而,对于我们的用例,我们只需要看一下事务部分。
In the transactions section, we’ll find things like:
在交易部分,我们会发现诸如以下内容。
- number of active transactions
- the status of each transaction
- number of tables involved in each transaction
- number of locks acquired by the transaction
- possibly the statement executed that may be holding the transaction
- information about lock wait
There is much more to see there, but this will be enough for us now.
那里还有很多东西要看,但现在对我们来说这些就足够了。
4.2. Process List
4.2.过程列表
The command SHOW PROCESSLIST presents a table with the session currently opened, and the table displays the following information:
命令SHOW PROCESSLIST显示一个当前打开的会话表,表显示以下信息。
- session id
- user name
- host connected
- database
- command/current active statement type
- running time
- state of the connection
- session description
This command lets us get an overview of the different active sessions, their state, and their activity.
这条命令让我们获得不同的活动会话、它们的状态和活动的概况。
4.3. Select Statement
4.3.选择声明
MySQL exposes some useful information through some tables, and we can use them to understand the kinds of locks strategies applied in a given scenario. They also hold things like the id of the current transaction.
MySQL通过一些表暴露了一些有用的信息,我们可以用它们来了解在特定情况下应用的锁策略种类。它们还保存了诸如当前事务的ID的东西。
For the purpose of this article we’ll use tables information_schema.innodb_trx and performance_schema.data_locks.
为了本文的目的,我们将使用表information_schema.innodb_trx和performance_schema.data_locks。
5. Testing Setup
5.测试设置
To run our tests, we’ll use a docker image of MySQL to create our database and populate our test schema so that we can exercise some transaction scenarios:
为了运行我们的测试,我们将使用MySQL的docker镜像来创建我们的数据库并填充我们的测试模式,这样我们就可以行使一些交易场景。
# Create MySQL container
docker run --network host --name example_db -e MYSQL_ROOT_PASSWORD=root -d mysql
Once we have our database server, we can create the schema by connecting to it and executing the scripts:
一旦我们有了数据库服务器,我们就可以通过连接到它并执行脚本来创建模式。
# Logging in MySQL
docker exec -it example_db mysql -uroot -p
Then, after typing the password, let’s create the database and insert some data:
然后,在输入密码后,让我们创建数据库并插入一些数据。
CREATE DATABASE example_db;
USE example_db;
CREATE TABLE zipcode (
code varchar(100) not null,
city varchar(100) not null,
country varchar(3) not null,
PRIMARY KEY (code)
);
INSERT INTO zipcode(code, city, country)
VALUES ('08025', 'Barcelona', 'ESP'),
('10583', 'New York', 'USA'),
('11075-430', 'Santos', 'BRA'),
('SW6', 'London', 'GBR');
6. Testing Scenarios
6.测试方案
The most important thing to remember is that the “Lock wait timeout exceeded” error happens when a transaction is waiting for a lock acquired by another.
最重要的一点是,当一个事务在等待另一个事务获得的锁时,会发生 “锁等待超时 “的错误。
The amount of time the transaction will wait depends on the value in the property innodb_lock_wait_timeout defined at the global or session level.
事务将等待的时间取决于在全局或会话级别定义的属性innodb_lock_wait_timeout中的值。
The possibility of facing this error depends on the complexity and the number of transactions per second. However, we’ll try to reproduce some common scenarios.
面临这种错误的可能性取决于复杂性和每秒的交易数量。然而,我们将尝试重现一些常见的情况。
Another point that may be worth mentioning is that a simple retry strategy can solve the problem caused by this error.
另一点可能值得一提的是,一个简单的重试策略可以解决这个错误造成的问题。
To help us during our tests, we’ll run the following command for all sessions we open:
为了在测试中帮助我们,我们将对我们打开的所有会话运行以下命令。
USE example_db;
-- Set our timeout to 10 seconds
SET @@SESSION.innodb_lock_wait_timeout = 10;
This defines the lock wait timeout to 10 seconds, preventing us from waiting too long to see the error.
这就把锁的等待超时定义为10秒,防止我们等待太久而看到错误。
6.1. Row Lock
6.1.行锁
As row locks are acquired in different situations, let’s try to reproduce a sample.
由于行锁是在不同情况下获得的,让我们试着重现一个样本。
First, we’ll connect to the server from two different sessions using the logging-in MySQL script we saw earlier. After that, let’s run the statement below in both sessions:
首先,我们将使用我们先前看到的登录MySQL脚本从两个不同的会话连接到服务器。之后,让我们在两个会话中运行下面的语句。
SET autocommit=0;
UPDATE zipcode SET code = 'SW6 1AA' WHERE code = 'SW6';
After 10 seconds, the second session will fail:
10秒后,第二个会话将失败。
mysql> UPDATE zipcode SET code = 'SW6 1AA' WHERE code = 'SW6';
ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction
The error happens because the first session starts a transaction due to the disabling of the autocommit. Next, once the UPDATE statement runs within the transaction, the exclusive lock of that row is acquired. However, no commit is executed, leaving the transaction open and causing the other transaction to keep waiting. As the commit never happens, the timeout of the lock wait reaches the limit. This also applies to DELETE statements.
错误的发生是因为第一个会话由于禁用了自动提交功能而启动了一个事务。接下来,一旦UPDATE语句在事务中运行,该行的独占锁就被获取。但是,并没有执行提交,使事务处于开放状态,导致其他事务继续等待。由于提交从未发生,锁等待的超时达到了极限。这也适用于DELETE语句。
6.2. Checking Row Lock in Data Locks Table
6.2.检查数据锁表中的行锁
Now, let’s rollback in both sessions and run the script as before in the first session, but this time, in the second session, let’s run the following statements:
现在,让我们在两个会话中回滚,像以前一样在第一个会话中运行脚本,但这一次,在第二个会话中,让我们运行以下语句。
SET autocommit=0;
UPDATE zipcode SET code = 'Test' WHERE code = '08025';
As we can observe, both statements execute successfully because they no longer require the lock of the same row.
我们可以看到,这两条语句都执行成功了,因为它们不再需要同一行的锁。
To confirm that, we’ll run the following statement in any of the sessions or in a new one:
为了确认这一点,我们将在任何一个会话中或在一个新的会话中运行以下语句。
SELECT * FROM performance_schema.data_locks;
The statement above returns four rows, two of them being table intent locks that specify that a transaction may intend to lock a row in the table and the other two being record locks. Looking at the columns LOCK_TYPE, LOCK_MODE and LOCK_DATA, we can confirm the locks we just described:
上面的语句返回四行,其中两行是表意向锁,指定事务可能打算锁定表中的某一行,另外两行是记录锁。看一下LOCK_TYPE、LOCK_MODE和LOCK_DATA这些列,我们可以确认我们刚才描述的锁。
Running rollback in both sessions and the query again, the result is an empty dataset.
在两个会话中运行回滚,并再次查询,结果是一个空数据集。
6.3. Row Lock and Indexes
6.3.行锁和索引
This time let’s use a different column in our WHERE clause. For the first session, we’ll run:
这次我们在WHERE子句中使用一个不同的列。对于第一个会话,我们将运行。
SET autocommit=0;
UPDATE zipcode SET city = 'SW6 1AA' WHERE country = 'USA';
While in the second one, let’s run these statements:
而在第二种情况下,让我们运行这些语句。
SET autocommit=0;
UPDATE zipcode SET city = '11025-030' WHERE country = 'BRA';
Something unexpected just happened. Even though the statements target two different rows, we have got the lock timeout error. Ok, if we repeat this same test right after running the SELECT statement on the table performance_schema.data_locks, we’ll see that actually, the first session locked all the rows, and the second session is waiting.
刚刚发生了一些意想不到的事情。尽管这些语句针对的是两条不同的记录,我们还是得到了锁超时的错误。好吧,如果我们在表performance_schema.data_locks上运行SELECT语句后,重复同样的测试,我们会看到实际上,第一个会话锁定了所有的行,而第二个会话正在等待。
The problem is related to how MySQL executes the query to find the candidates for the update because the column used in the WHERE clause has no index. MySQL has to scan all the rows to find the ones that match the WHERE condition, which also causes these rows to be locked.
这个问题与MySQL 执行查询以找到更新的候选数据有关,因为WHERE子句中使用的列没有索引。MySQL必须扫描所有的记录以找到符合WHERE条件的记录,这也会导致这些记录被锁定。
It’s important to be sure that our statements are optimal.
必须确保我们的报表是最佳的。
6.4. Row Lock and Updates/Deletes With Multiple Tables
6.4.多表的行锁和更新/删除
Other common cases for the lock timeout error are DELETE and UPDATE statements involving multiple tables. The number of locked rows depends on the statement execution plan, but we should keep in mind that all the tables involved may have some rows locked.
锁定超时错误的其他常见情况是DELETE和UPDATE语句涉及多个表。锁定的行数取决于语句的执行计划,但我们应该记住,所有涉及的表都可能有一些行被锁定。
As an example, let’s rollback all the other transactions and execute these statements:
作为一个例子,让我们回滚所有其他事务并执行这些语句。
CREATE TABLE zipcode_backup SELECT * FROM zipcode;
SET autocommit=0;
DELETE FROM zipcode_backup WHERE code IN (SELECT code FROM zipcode);
Here, we created a table and started a transaction that reads from the zipcode table and writes to the zipcode_backup table in a single statement.
在这里,我们创建了一个表并启动了一个事务,在一条语句中从zipcode表读取并写入zipcode_backup表。
The next step is to run the following statement in the second session:
下一步是在第二个会话中运行以下语句。
SET autocommit=0;
UPDATE zipcode SET code = 'SW6 1AA' WHERE code = 'SW6';
Once again, transaction two timed out as the first one had acquired the lock of the rows in the table. Let’s just run the SELECT statement in the data_lock table to demonstrate what happened. Then, let’s rollback both sessions.
再一次,事务二超时了,因为第一个事务已经获得了表内行的锁。让我们在data_lock表中运行SELECT语句来演示发生了什么。然后,让我们回滚两个会话。
6.5. Row Lock When Filling Temp Tables
6.5.填充临时表时的行锁定
In this example, let’s mix DDL and DMLs executing in the first session of the new script:
在这个例子中,让我们在新脚本的第一个会话中混合执行DDL和DMLs。
CREATE TEMPORARY TABLE temp_zipcode SELECT * FROM zipcode;
Then if we repeat the statement we used before in the second session, we’ll be able to see the lock error once again.
然后,如果我们在第二个会话中重复之前使用的语句,我们就能再次看到锁定错误。
6.6. Shared and Exclusive Lock
6.6.共享和独占锁
Let’s not forget to rollback both session transactions at the end of each test.
我们不要忘记在每次测试结束时回滚两个会话事务。
We already discussed shared and exclusive locks. However, we didn’t see how to define them explicitly using the LOCK IN SHARE MODE and FOR UPDATE options. First, let’s use the shared mode:
我们已经讨论了共享锁和独占锁。然而,我们没有看到如何使用LOCK IN SHARE MODE和FOR UPDATE选项来明确定义它们。首先,让我们使用共享模式。
SET autocommit=0;
SELECT * FROM zipcode WHERE code = 'SW6' LOCK IN SHARE MODE;
Now, we’ll run the same update as we did previously, and the result is again the timeout. Besides that, we should remember that reads are allowed here.
现在,我们将运行与之前一样的更新,结果又是超时。除此之外,我们应该记住,这里允许读。
As opposed to the SHARE MODE, the FOR UPDATE doesn’t allow read locks, as shown next when we run a statement in the first session:
相对于共享模式,FOR UPDATE不允许读锁,如接下来我们在第一个会话中运行语句时所示。
SET autocommit=0;
SELECT * FROM zipcode WHERE code = 'SW6' FOR UPDATE;
And then, we run the same SELECT statement with the SHARE MODE option used before in the first session, but now in the second one, and we’ll observe once more the timeout error. To recap, the SHARE MODE lock can be acquired for more than one session, and it locks writes. The exclusive lock or FOR UPDATE option allows reads but not lock reads or writes.
然后,我们运行同样的SELECT语句,在第一个会话中使用SHARE MODE选项,但现在在第二个会话中,我们将再次观察到超时错误。简而言之,SHARE MODE锁可以在一个以上的会话中获得,它锁定了写入。独占锁或FOR UPDATE选项允许读,但不锁定读或写。
6.7. Table Locks
6.7.表锁
Table lock doesn’t have a timeout and is not recommended for InnoDB:
表锁没有超时,不建议用于InnoDB。
LOCK TABLE zipcode WRITE;
Once we run this, we can open another session, try a select or an update, and check that it will be locked, but this time, no timeout happens. Going a bit further, we can open a third session and run:
一旦我们运行这个,我们可以打开另一个会话,尝试选择或更新,并检查它是否会被锁定,但这次没有发生超时。再往前走一点,我们可以打开第三个会话并运行。
SHOW PROCESSLIST;
It displays the active sessions with their state, and we’ll see the first session sleeping and the second one waiting for the metadata lock of the table. The solution, in this case, would be running the next command:
它显示了活动的会话和它们的状态,我们会看到第一个会话在睡觉,第二个会话在等待表的元数据锁。在这种情况下,解决方案是运行下一条命令。
UNLOCK TABLES;
Other scenarios where we may find sessions waiting to acquire some metadata lock are during the execution of DDL, like ALTER TABLEs.
我们可能发现会话等待获取一些元数据锁的其他场景是在执行DDL期间,比如ALTER TABLEs。
6.8. Gap Locks
6.8.间隙锁
Gap locks happen when a particular interval of indexed records is locked, and another session tries to perform some operation within this interval. In this case, even inserts can be impacted.
间隙锁发生在索引记录的特定区间被锁定,而另一个会话试图在这个区间内执行一些操作。在这种情况下,甚至插入也会受到影响。
Let’s consider the following statement executed in the first session:
让我们考虑一下在第一个会话中执行的以下语句。
CREATE TABLE address_type ( id bigint(20) not null, name varchar(255) not null, PRIMARY KEY (id) );
SET autocommit=0;
INSERT INTO address_type(id, name) VALUES (1, 'Street'), (2, 'Avenue'), (5, 'Square');
COMMIT;
SET autocommit=0;
SELECT * FROM address_type WHERE id BETWEEN 1 and 5 LOCK IN SHARE MODE;
In the second session, we’ll run the following statement:
在第二个环节,我们将运行以下语句。
SET autocommit=0;
INSERT INTO address_type(id, name) VALUES (3, 'Road'), (4, 'Park');
After we run the data lock, we select the statement in a third session so we can check the new LOCK MODE value, GAP. This can also be applied for UPDATE and DELETE statements.
在我们运行数据锁后,我们在第三个会话中选择该语句,这样我们可以检查新的LOCK MODE值,GAP。这也可以应用于UPDATE和DELETE语句。
6.9. Deadlocks
6.9.死锁
By default, MySQL tries to identify deadlocks, and in case it manages to solve the graph of dependencies between the transactions, it automatically kills one of the tasks in order to allow the others to go through. Otherwise, we get a lock timeout error, as we saw before.
默认情况下,MySQL试图识别死锁,如果它成功地解决了事务之间的依赖关系图,它会自动杀死其中一个任务,以便让其他任务通过。否则,我们会得到一个锁超时的错误,正如我们之前看到的那样。
Let’s simulate a simple deadlock scenario. For the first session, we execute:
让我们来模拟一个简单的死锁场景。对于第一个会话,我们执行。
SET autocommit=0;
SELECT * FROM address_type WHERE id = 1 FOR UPDATE;
SELECT tx.trx_id FROM information_schema.innodb_trx tx WHERE tx.trx_mysql_thread_id = connection_id();
The last SELECT statement will give us the current transaction ID. We’ll need it to check the logs later. Then, for the second session, let’s run:
最后一条SELECT语句将给我们提供当前的交易ID。我们将需要它来检查以后的日志。然后,对于第二个会话,让我们运行。
SET autocommit=0;
SELECT * FROM address_type WHERE id = 2 FOR UPDATE;
SELECT tx.trx_id FROM information_schema.innodb_trx tx WHERE tx.trx_mysql_thread_id = connection_id();
SELECT * FROM address_type WHERE id = 1 FOR UPDATE;
In the sequence, we go back to session one and run:
在这个序列中,我们回到第一节并运行。
SELECT * FROM address_type WHERE id = 2 FOR UPDATE;
Immediately, we’ll get an error:
随即,我们会得到一个错误。
ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction
And last, we go to a third session, and we run:
最后,我们进入第三个环节,我们跑步。
SHOW ENGINE INNODB STATUS;
The output of the command should be similar to this:
命令的输出应该与此类似。
------------------------
LATEST DETECTED DEADLOCK
------------------------
*** (1) TRANSACTION:
TRANSACTION 4036, ACTIVE 11 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 1128, 2 row lock(s)
MySQL thread id 9, OS thread handle 139794615064320, query id 252...
SELECT * FROM address_type WHERE id = 1 FOR UPDATE
*** (1) HOLDS THE LOCK(S):
RECORD LOCKS ... index PRIMARY of table `example_db`.`address_type` trx id 4036 lock_mode X locks rec but not gap
Record lock
...
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS ... index PRIMARY of table `example_db`.`address_type` trx id 4036 lock_mode X locks rec but not gap waiting
Record lock
...
*** (2) TRANSACTION:
TRANSACTION 4035, ACTIVE 59 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), ... , 2 row lock(s)
MySQL thread id 11, .. query id 253 ...
SELECT * FROM address_type WHERE id = 2 FOR UPDATE
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS ... index PRIMARY of table `example_db`.`address_type` trx id 4035 lock_mode X locks rec but not gap
Record lock
...
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS ... index PRIMARY of table `example_db`.`address_type` trx id 4035 lock_mode X locks rec but not gap waiting
Record lock
...
*** WE ROLL BACK TRANSACTION (2)
------------
TRANSACTIONS
------------
Trx id counter 4037
...
LIST OF TRANSACTIONS FOR EACH SESSION:
...
---TRANSACTION 4036, ACTIVE 18 sec
3 lock struct(s), heap size 1128, 2 row lock(s)
MySQL thread id 9, ... , query id 252 ...
Using the transaction ids we got before, we can find a lot of useful information, such as the state of the connection at the moment of the error, the number of row locks, the last command executed, the description of holding locks, and the description of the locks the transaction was waiting for. After that, it repeats the same for the other transactions involved in the deadlock. Also, in the end, we find the information about which transactions were rolled back.
利用我们之前得到的事务ID,我们可以找到很多有用的信息,比如出错时的连接状态,行锁的数量,最后执行的命令,持有锁的描述,以及该事务正在等待的锁的描述。之后,它对参与死锁的其他事务重复同样的工作。另外,在最后,我们可以找到关于哪些事务被回滚的信息。
7. Conclusion
7.结语
In this article, we looked at locks in MySQL, how they work, and when they cause the “Lock wait timeout exceeded” error.
在这篇文章中,我们研究了MySQL中的锁,它们是如何工作的,以及它们何时导致 “锁等待超时 “错误。
We defined test scenarios that allowed us to reproduce this error and check the internal nuances of the database server when handling transactions.
我们定义了测试场景,使我们能够重现这个错误,并检查数据库服务器在处理事务时的内部细微差别。