On Thu 08-03-18 08:58:17, Eryu Guan wrote: > On Wed, Mar 07, 2018 at 11:11:10AM +0100, Jan Kara wrote: > > On Tue 23-01-18 16:37:23, Eryu Guan wrote: > > > Currently in ext4 direct write path, we update i_disksize only when > > > new eof is greater than i_size, and don't update it even when new > > > eof is greater than i_disksize but less than i_size. This doesn't > > > work well with delalloc buffer write, which updates i_size and > > > i_disksize only when delalloc blocks are resolved (at writeback > > > time), the i_disksize from direct write can be lost if a previous > > > buffer write succeeded at write time but failed at writeback time, > > > then results in corrupted ondisk inode size. > > > > > > Consider this case, first buffer write 4k data to a new file at > > > offset 16k with delayed allocation, then direct write 4k data to the > > > same file at offset 4k before delalloc blocks are resolved, which > > > doesn't update i_disksize because it writes within i_size(20k), but > > > the extent tree metadata has been committed in journal. Then > > > writeback of the delalloc blocks fails (due to device error etc.), > > > and i_size/i_disksize from buffer write can't be written to disk > > > (still zero). A subsequent umount/mount cycle recovers journal and > > > writes extent tree metadata from direct write to disk, but with > > > i_disksize being zero. > > > > > > Fix it by updating i_disksize too in direct write path when new eof > > > is greater than i_disksize but less than i_size, so i_disksize is > > > always consistent with direct write. > > > > > > This fixes occasional i_size corruption in fstests generic/475. > > > > > > Signed-off-by: Eryu Guan <eguan@xxxxxxxxxx> > > > --- > > > I think this matches what XFS does in direct write too. > > > > > > I've tested it by looping generic/475 200 times without hitting a > > > corruption, usually it fails within 5 iterations for me. Also tested by > > > full fstests runs on ext2_4k, ext3_2k, ext4_1k configurations and all > > > results looked good. > > > > Thanks for the patch! It looks good to me. Just when looking at these > > Thanks for the review! > > > i_disksize updates and thinking about mixing them with page writeback there > > seems to be another bug that these i_disksize updates are not protected by > > ei->i_data_sem (which is what protects i_disksize update in the writeback > > path). So probably that should be fixed up as well as otherwise I'm not > > sure we cannot corrupt i_disksize in some funny way when writeback and dio > > write race... > > It's been a while and I'll get myself refamiliar with this code path and > look into it. Do you think we can apply this patch as is for now and fix > the race you mentioned in a follow-up patch? Or I should fix both of > them in v2? IMO it would be better to fix the locking in patch 1/2 (the best would be to make that code use ext4_update_i_disksize() which uses proper locking) and then apply your patch as 2/2. Because your patch makes i_disksize updates more frequent especially in the problematic situations where both DIO and writeback code want to update it and so chances these two updates race in a wrong way are made higher by your fix. Thanks! Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR