Re: [PATCH -RFC 0/2] mm/ext4: avoid data corruption when extending DIO write race with buffered read

Baokun Li <libaokun1@xxxxxxxxxx> · Tue, 5 Dec 2023 21:19:03 +0800

On 2023/12/5 12:17, Theodore Ts'o wrote:
On Mon, Dec 04, 2023 at 09:50:18PM +0800, Baokun Li wrote:
The problem is with a one-master-twoslave MYSQL database with three
physical machines, and using sysbench pressure testing on each of the
three machines, the problem occurs about once every two to three hours.

The problem is with the relay log file, and when the problem occurs,
the middle dozens of bytes of the file are read as all zeros, while
the data on disk is not. This is a journal-like file where a write
process gets the data from the master node and writes it locally,
and another replay process reads the file and performs the replay
operation accordingly (some SQL statements).  The problem is that
when replaying, it finds that the data read is corrupted, not valid
SQL data, while the data on disk is normal.
You mentioned "scripts" --- are these locally developped scripts by
any chance?
This refers to the sql commands to be replayed in the relay log file.
 I don't know much about this file, but you can read the official
documentation.
https://dev.mysql.com/doc/refman/8.0/en/replica-logs-relaylog.html
The procedure suggested in a few places that I looked up
don't involve needing to read the replay log.   For example from[1]:

On the master server:

root@repl-master:~# mysql -uroot -p;
mysql> CREATE USER ‘slave’@’12.34.56.789‘ IDENTIFIED BY ‘SLAVE_PASSWORD‘;
mysql> GRANT REPLICATION SLAVE ON . TO ‘slave’@’12.34.56.222 ‘;
mysql> FLUSH PRIVILEGES;
mysql> FLUSH TABLES WITH READ LOCK;

This will make the master server read-only, with all pending writes
flushed out (so you don't need to worry about the replay log), and
then you move the data from the master to slave:

root@repl-master:~# mysqldump -u root -p –all-databases –master-data > data.sql
root@repl-master:~# scp data.sql root@12.34.56.222

Then on the slave:

root@repl-slave:~# mysql -uroot -p < data.sql
root@repl-slave:~# mysql -uroot -p;
mysql> STOP SLAVE;

... and then on the master:

root@repl-master:~# mysql -uroot -p;
mysql> UNLOCK TABLES;

... and back on the slave:

root@repl-slave:~# mysql -uroot -p;
mysql> START SLAVE;

[1] https://hevodata.com/learn/mysql-master-slave-replication/

... or you could buy the product advertised at [1] which is easier for
the database administrators, but results in $$$ flowing to the Hevo
company.  :-)

In any case, I'm pretty sure that the official documented way of
setting up a failover replication setup doesn't involve buffered reads
of the replay file.

It is certainly the case that mysqldump uses buffered reads, but
that's why you have to temporary make the database read-only using
"FLUSH TABLES WITH READ LOCK" before taking a database snapshot, and
then re-enable database updates the "UNLOCK TABLES" SQL commands.

Cheers,

					- Ted
Thank you very much for your detailed explanation!
But the downstream users do have buffered reads to read the relay log
file, as I confirmed with bpftrace. Here's an introduction to turning on
relay logging, but I'm not sure if you can access this link:
https://blog.csdn.net/javaanddonet/article/details/112596148

Thanks!
--
With Best Regards,
Baokun Li
.