Hello! On Tue 19-09-23 14:00:04, Gao Xiang wrote: > Our consumer reports a behavior change between pre-iomap and iomap > direct io conversion: > > If the system crashes after an appending write to a file open with > O_DIRECT | O_SYNC flag set, file i_size won't be updated even if > O_SYNC was marked before. > > It can be reproduced by a test program in the attachment with > gcc -o repro repro.c && ./repro testfile && echo c > /proc/sysrq-trigger > > After some analysis, we found that before iomap direct I/O conversion, > the timing was roughly (taking Linux 3.10 codebase as an example): > > .. > - ext4_file_dio_write > - __generic_file_aio_write > .. > - ext4_direct_IO # generic_file_direct_write > - ext4_ext_direct_IO > - ext4_ind_direct_IO # final_size > inode->i_size > - .. > - ret = blockdev_direct_IO() > - i_size_write(inode, end) # orphan && ret > 0 && > # end > inode->i_size > - ext4_mark_inode_dirty() > - ... > - generic_write_sync # handling O_SYNC > > So the dirty inode meta will be committed into journal immediately > if O_SYNC is set. However, After commit 569342dc2485 ("ext4: move > inode extension/truncate code out from ->iomap_end() callback"), > the new behavior seems as below: > > .. > - ext4_dio_write_iter > - ext4_dio_write_checks # extend = 1 > - iomap_dio_rw > - __iomap_dio_rw > - iomap_dio_complete > - generic_write_sync > - ext4_handle_inode_extension # extend = 1 > > So that i_size will be recorded only after generic_write_sync() is > called. So O_SYNC won't flush the update i_size to the disk. Indeed, that looks like a bug. Thanks for report! > On the other side, after a quick look of XFS side, it will record > i_size changes in xfs_dio_write_end_io() so it seems that it doesn't > have this problem. Yes, I'm a bit hazy on the details but I think we've decided to call ext4_handle_inode_extension() directly from ext4_dio_write_iter() because from ext4_dio_write_end_io() it was difficult to test in a race-free way whether extending i_size (and i_disksize) is needed or not (we don't necessarily hold i_rwsem there). I'll think how we could fix the problem you've reported. Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR