On Wed 13-10-21 09:46:46, Zhengyuan Liu wrote: > Hi, all > > we are encounting following Mysql crash problem while importing tables : > > 2021-09-26T11:22:17.825250Z 0 [ERROR] [MY-013622] [InnoDB] [FATAL] > fsync() returned EIO, aborting. > 2021-09-26T11:22:17.825315Z 0 [ERROR] [MY-013183] [InnoDB] > Assertion failure: ut0ut.cc:555 thread 281472996733168 > > At the same time , we found dmesg had following message: > > [ 4328.838972] Page cache invalidation failure on direct I/O. > Possible data corruption due to collision with buffered I/O! > [ 4328.850234] File: /data/mysql/data/sysbench/sbtest53.ibd PID: > 625 Comm: kworker/42:1 > > Firstly, we doubled Mysql has operating the file with direct IO and > buffered IO interlaced, but after some checking we found it did only > do direct IO using aio. The problem is exactly from direct-io > interface (__generic_file_write_iter) itself. > > ssize_t __generic_file_write_iter() > { > ... > if (iocb->ki_flags & IOCB_DIRECT) { > loff_t pos, endbyte; > > written = generic_file_direct_write(iocb, from); > /* > * If the write stopped short of completing, fall back to > * buffered writes. Some filesystems do this for writes to > * holes, for example. For DAX files, a buffered write will > * not succeed (even if it did, DAX does not handle dirty > * page-cache pages correctly). > */ > if (written < 0 || !iov_iter_count(from) || IS_DAX(inode)) > goto out; > > status = generic_perform_write(file, from, pos = iocb->ki_pos); > ... > } > > From above code snippet we can see that direct io could fall back to > buffered IO under certain conditions, so even Mysql only did direct IO > it could interleave with buffered IO when fall back occurred. I have > no idea why FS(ext3) failed the direct IO currently, but it is strange > __generic_file_write_iter make direct IO fall back to buffered IO, it > seems breaking the semantics of direct IO. > > The reproduced environment is: > Platform: Kunpeng 920 (arm64) > Kernel: V5.15-rc > PAGESIZE: 64K > Mysql: V8.0 > Innodb_page_size: default(16K) Thanks for report. I agree this should not happen. How hard is this to reproduce? Any idea whether the fallback to buffered IO happens because iomap_dio_rw() returns -ENOTBLK or because it returns short write? Can you post output of "dumpe2fs -h <device>" for the filesystem where the problem happens? Thanks! Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR