On Thu, Oct 21, 2021 at 4:03 PM Jan Kara <jack@xxxxxxx> wrote: > > On Thu 21-10-21 10:21:55, Zhengyuan Liu wrote: > > On Thu, Oct 21, 2021 at 1:37 AM Jan Kara <jack@xxxxxxx> wrote: > > > On Wed 13-10-21 09:46:46, Zhengyuan Liu wrote: > > > > we are encounting following Mysql crash problem while importing tables : > > > > > > > > 2021-09-26T11:22:17.825250Z 0 [ERROR] [MY-013622] [InnoDB] [FATAL] > > > > fsync() returned EIO, aborting. > > > > 2021-09-26T11:22:17.825315Z 0 [ERROR] [MY-013183] [InnoDB] > > > > Assertion failure: ut0ut.cc:555 thread 281472996733168 > > > > > > > > At the same time , we found dmesg had following message: > > > > > > > > [ 4328.838972] Page cache invalidation failure on direct I/O. > > > > Possible data corruption due to collision with buffered I/O! > > > > [ 4328.850234] File: /data/mysql/data/sysbench/sbtest53.ibd PID: > > > > 625 Comm: kworker/42:1 > > > > > > > > Firstly, we doubled Mysql has operating the file with direct IO and > > > > buffered IO interlaced, but after some checking we found it did only > > > > do direct IO using aio. The problem is exactly from direct-io > > > > interface (__generic_file_write_iter) itself. > > > > > > > > ssize_t __generic_file_write_iter() > > > > { > > > > ... > > > > if (iocb->ki_flags & IOCB_DIRECT) { > > > > loff_t pos, endbyte; > > > > > > > > written = generic_file_direct_write(iocb, from); > > > > /* > > > > * If the write stopped short of completing, fall back to > > > > * buffered writes. Some filesystems do this for writes to > > > > * holes, for example. For DAX files, a buffered write will > > > > * not succeed (even if it did, DAX does not handle dirty > > > > * page-cache pages correctly). > > > > */ > > > > if (written < 0 || !iov_iter_count(from) || IS_DAX(inode)) > > > > goto out; > > > > > > > > status = generic_perform_write(file, from, pos = iocb->ki_pos); > > > > ... > > > > } > > > > > > > > From above code snippet we can see that direct io could fall back to > > > > buffered IO under certain conditions, so even Mysql only did direct IO > > > > it could interleave with buffered IO when fall back occurred. I have > > > > no idea why FS(ext3) failed the direct IO currently, but it is strange > > > > __generic_file_write_iter make direct IO fall back to buffered IO, it > > > > seems breaking the semantics of direct IO. > > > > > > > > The reproduced environment is: > > > > Platform: Kunpeng 920 (arm64) > > > > Kernel: V5.15-rc > > > > PAGESIZE: 64K > > > > Mysql: V8.0 > > > > Innodb_page_size: default(16K) > > > > > > Thanks for report. I agree this should not happen. How hard is this to > > > reproduce? Any idea whether the fallback to buffered IO happens because > > > iomap_dio_rw() returns -ENOTBLK or because it returns short write? > > > > It is easy to reproduce in my test environment, as I said in the previous > > email replied to Andrew this problem is related to kernel page size. > > Ok, can you share a reproducer? I don't have a simple test case to reproduce, the whole procedure shown as following is somewhat complex. 1. Prepare Mysql installation environment a. Prepare a SSD partition (at least 100G) as the Mysql data partition, format to Ext3 and mount to /data # mkfs.ext3 /dev/sdb1 # mount /dev/sdb1 /data b. Create Mysql user and user group # groupadd mysql # useradd -g mysql mysql c. Create Mysql directory # mkdir -p /data/mysql # cd /data/mysql # mkdir data tmp run log 2. Install Mysql a. Download mysql-8.0.25-1.el8.aarch64.rpm-bundle.tar from https://downloads.mysql.com/archives/community/ b. Install Mysql # tar -xvf mysql-8.0.25-1.el8.aarch64.rpm-bundle.tar # yum install openssl openssl-devel # rpm -ivh mysql-community-common-8.0.25-1.el8.aarch64.rpm mysql-community-client-plugins-8.0.25-1.el8.aarch64.rpm \ mysql-community-libs-8.0.25-1.el8.aarch64.rp mysql-community-client-8.0.25-1.el8.aarch64.rpm \ mysql-community-server-8.0.25-1.el8.aarch64.rpm mysql-community-devel-8.0.25-1.el8.aarch64.rpm 3. Configure Mysql a. # chown mysql:mysql /etc/my.cnf b. # vim /etc/my.cnf innodb_flush_method = O_DIRECT default-storage-engine=INNODB datadir=/data/mysql/data socket=/data/mysql/run/mysql.sock tmpdir=/data/mysql/tmp log-error=/data/mysql/log/mysqld.log pid-file=/data/mysql/run/mysqld.pid port=3306 user=mysql c. initialize Mysql (problem may reproduce at this stage) # mysqld --defaults-file=/etc/my.cnf --initialize d. Start Mysql # mysqld --defaults-file=/etc/my.cnf & e. Login into Mysql # mysql -uroot -p -S /data/mysql/run/mysql.sock You can see the temporary password from step 3.c f. Configure access mysql> alter user 'root'@'localhost' identified by "123456"; mysql> create user 'root'@'%' identified by '123456'; mysql> grant all privileges on *.* to 'root'@'%'; flush privileges; mysql> create database sysbench; 4. Use sysbench to test Mysql a. Install sysbench from https://github.com/akopytov/sysbench/archive/master.zip b. Use following script to reproduce problem (may need dozens of minutes) while true ; do sysbench /usr/local/share/sysbench/oltp_write_only.lua --table-size=1000000 --tables=100 \ --threads=32 --db-driver=mysql --mysql-db=sysbench --mysql-host=127.0.0.1 --mysql- port=3306 \ --mysql-user=root --mysql-password=123456 --mysql-socket=/var/lib/mysql/mysql.sock prepare sleep 5 sysbench /usr/local/share/sysbench/oltp_write_only.lua --table-size=1000000 --tables=100 \ --threads=32 --db-driver=mysql --mysql-db=sysbench --mysql-host=127.0.0.1 --mysql- port=3306 \ --mysql-user=root --mysql-password=123456 --mysql-socket=/var/lib/mysql/mysql.sock cleanup sleep 5 done If you can't reproduce, we could provide a remote environment for you or connect to your machine to build a reproduced environment. > > > Can you post output of "dumpe2fs -h <device>" for the filesystem where the > > > problem happens? Thanks! > > > > Sure, the output is: > > > > # dumpe2fs -h /dev/sda3 > > dumpe2fs 1.45.3 (14-Jul-2019) > > Filesystem volume name: <none> > > Last mounted on: /data > > Filesystem UUID: 09a51146-b325-48bb-be63-c9df539a90a1 > > Filesystem magic number: 0xEF53 > > Filesystem revision #: 1 (dynamic) > > Filesystem features: has_journal ext_attr resize_inode dir_index > > filetype needs_recovery sparse_super large_file > > Thanks for the data. OK, a filesystem without extents. Does your test by > any chance try to do direct IO to a hole in a file? Because that is not > (and never was) supported without extents. Also the fact that you don't see > the problem with ext4 (which means extents support) would be pointing in > that direction. I am not sure if it trys to do direct IO to a hole or not, is there any way to check? If you have a simple test to reproduce please let me know, we are glad to try. Thanks,