Re: [bug report] ext4 misses final i_size meta sync under O_DIRECT | O_SYNC semantics after iomap DIO conversion

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




(sorry... add Darrick here...)

Hi Jan,

On 2023/9/19 20:05, Jan Kara wrote:
Hello!

On Tue 19-09-23 14:00:04, Gao Xiang wrote:
Our consumer reports a behavior change between pre-iomap and iomap
direct io conversion:

If the system crashes after an appending write to a file open with
O_DIRECT | O_SYNC flag set, file i_size won't be updated even if
O_SYNC was marked before.

It can be reproduced by a test program in the attachment with
gcc -o repro repro.c && ./repro testfile && echo c > /proc/sysrq-trigger

After some analysis, we found that before iomap direct I/O conversion,
the timing was roughly (taking Linux 3.10 codebase as an example):

	..
	- ext4_file_dio_write
	  - __generic_file_aio_write
	      ..
	    - ext4_direct_IO  # generic_file_direct_write
	      - ext4_ext_direct_IO
	        - ext4_ind_direct_IO  # final_size > inode->i_size
	          - ..
	          - ret = blockdev_direct_IO()
	          - i_size_write(inode, end) # orphan && ret > 0 &&
	                                   # end > inode->i_size
	          - ext4_mark_inode_dirty()
	          - ...
	  - generic_write_sync  # handling O_SYNC

So the dirty inode meta will be committed into journal immediately
if O_SYNC is set.  However, After commit 569342dc2485 ("ext4: move
inode extension/truncate code out from ->iomap_end() callback"),
the new behavior seems as below:

	..
	- ext4_dio_write_iter
	  - ext4_dio_write_checks  # extend = 1
	  - iomap_dio_rw
	      - __iomap_dio_rw
	      - iomap_dio_complete
	        - generic_write_sync
	  - ext4_handle_inode_extension  # extend = 1

So that i_size will be recorded only after generic_write_sync() is
called.  So O_SYNC won't flush the update i_size to the disk.

Indeed, that looks like a bug. Thanks for report!

Thanks for the confirmation!


On the other side, after a quick look of XFS side, it will record
i_size changes in xfs_dio_write_end_io() so it seems that it doesn't
have this problem.

Yes, I'm a bit hazy on the details but I think we've decided to call
ext4_handle_inode_extension() directly from ext4_dio_write_iter() because
from ext4_dio_write_end_io() it was difficult to test in a race-free way
whether extending i_size (and i_disksize) is needed or not (we don't
necessarily hold i_rwsem there). I'll think how we could fix the problem
you've reported.

Yes, another concern is O_DSYNC, I'm quite not sure if the behavior
is changed too.

I had a rough feeling that currently iomap DIO behaviors on these are
too strict and might not fit in each specific fs detailed
implementation, tho.

Thanks,
Gao Xiang


								Honza



[Index of Archives]     [Reiser Filesystem Development]     [Ceph FS]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite National Park]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Media]

  Powered by Linux