On Thu, Oct 21, 2021 at 1:37 AM Jan Kara <jack@xxxxxxx> wrote: > > On Wed 13-10-21 09:46:46, Zhengyuan Liu wrote: > > Hi, all > > > > we are encounting following Mysql crash problem while importing tables : > > > > 2021-09-26T11:22:17.825250Z 0 [ERROR] [MY-013622] [InnoDB] [FATAL] > > fsync() returned EIO, aborting. > > 2021-09-26T11:22:17.825315Z 0 [ERROR] [MY-013183] [InnoDB] > > Assertion failure: ut0ut.cc:555 thread 281472996733168 > > > > At the same time , we found dmesg had following message: > > > > [ 4328.838972] Page cache invalidation failure on direct I/O. > > Possible data corruption due to collision with buffered I/O! > > [ 4328.850234] File: /data/mysql/data/sysbench/sbtest53.ibd PID: > > 625 Comm: kworker/42:1 > > > > Firstly, we doubled Mysql has operating the file with direct IO and > > buffered IO interlaced, but after some checking we found it did only > > do direct IO using aio. The problem is exactly from direct-io > > interface (__generic_file_write_iter) itself. > > > > ssize_t __generic_file_write_iter() > > { > > ... > > if (iocb->ki_flags & IOCB_DIRECT) { > > loff_t pos, endbyte; > > > > written = generic_file_direct_write(iocb, from); > > /* > > * If the write stopped short of completing, fall back to > > * buffered writes. Some filesystems do this for writes to > > * holes, for example. For DAX files, a buffered write will > > * not succeed (even if it did, DAX does not handle dirty > > * page-cache pages correctly). > > */ > > if (written < 0 || !iov_iter_count(from) || IS_DAX(inode)) > > goto out; > > > > status = generic_perform_write(file, from, pos = iocb->ki_pos); > > ... > > } > > > > From above code snippet we can see that direct io could fall back to > > buffered IO under certain conditions, so even Mysql only did direct IO > > it could interleave with buffered IO when fall back occurred. I have > > no idea why FS(ext3) failed the direct IO currently, but it is strange > > __generic_file_write_iter make direct IO fall back to buffered IO, it > > seems breaking the semantics of direct IO. > > > > The reproduced environment is: > > Platform: Kunpeng 920 (arm64) > > Kernel: V5.15-rc > > PAGESIZE: 64K > > Mysql: V8.0 > > Innodb_page_size: default(16K) > > Thanks for report. I agree this should not happen. How hard is this to > reproduce? Any idea whether the fallback to buffered IO happens because > iomap_dio_rw() returns -ENOTBLK or because it returns short write? It is easy to reproduce in my test environment, as I said in the previous email replied to Andrew this problem is related to kernel page size. > Can you post output of "dumpe2fs -h <device>" for the filesystem where the > problem happens? Thanks! Sure, the output is: # dumpe2fs -h /dev/sda3 dumpe2fs 1.45.3 (14-Jul-2019) Filesystem volume name: <none> Last mounted on: /data Filesystem UUID: 09a51146-b325-48bb-be63-c9df539a90a1 Filesystem magic number: 0xEF53 Filesystem revision #: 1 (dynamic) Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery sparse_super large_file Filesystem flags: unsigned_directory_hash Default mount options: user_xattr acl Filesystem state: clean Errors behavior: Continue Filesystem OS type: Linux Inode count: 11034624 Block count: 44138240 Reserved block count: 2206912 Free blocks: 43168100 Free inodes: 11034613 First block: 0 Block size: 4096 Fragment size: 4096 Reserved GDT blocks: 1013 Blocks per group: 32768 Fragments per group: 32768 Inodes per group: 8192 Inode blocks per group: 512 Filesystem created: Thu Oct 21 09:42:03 2021 Last mount time: Thu Oct 21 09:43:36 2021 Last write time: Thu Oct 21 09:43:36 2021 Mount count: 1 Maximum mount count: -1 Last checked: Thu Oct 21 09:42:03 2021 Check interval: 0 (<none>) Reserved blocks uid: 0 (user root) Reserved blocks gid: 0 (group root) First inode: 11 Inode size: 256 Required extra isize: 32 Desired extra isize: 32 Journal inode: 8 Default directory hash: half_md4 Directory Hash Seed: a7b04e61-1209-496d-ab9d-a51009b51ddb Journal backup: inode blocks Journal features: journal_incompat_revoke Journal size: 1024M Journal length: 262144 Journal sequence: 0x00000002 Journal start: 1 BTW, we have also tested Ext4 and XFS and didn't see direct write fallback. Thanks,