On Tue, Sep 17, 2019 at 02:02:33AM -0700, Christoph Hellwig wrote: > On Tue, Sep 17, 2019 at 02:30:15PM +0530, Ritesh Harjani wrote: > > So if we have a delayed buffered write to a file, > > in that case we first only update inode->i_size and update > > i_disksize at writeback time > > (i.e. during block allocation). > > In that case when we call for ext4_dio_write_iter > > since offset + len > i_disksize, we call for ext4_update_i_disksize(). > > > > Now if writeback for some reason failed. And the system crashes, during the > > DIO writes, after the blocks are allocated. Then during reboot we may have > > an inconsistent inode, since we did not add the inode into the > > orphan list before we updated the inode->i_disksize. And journal replay > > may not succeed. > > > > 1. Can above actually happen? I am still not able to figure out the > > race/inconsistency completely. > > 2. Can you please help explain under what other cases > > it was necessary to call ext4_update_i_disksize() in DIO write paths? > > 3. When will i_disksize be out-of-sync with i_size during DIO writes? > > None of the above seems new in this patchset, does it? That's correct. *Ritesh - FWIW, I think you'll find the answers to your questions above by referring to the following commits: 1) 73fdad00b208b 2) 45d8ec4d9fd54 If you drop the check (offset + count > EXT4_I(inode)->i_disksize) and the call to ext4_update_i_disksize(), under some workloads i.e. "generic/475" you'll generally end up with metadata corruption. > That being said I found the early size update odd. XFS updates the on-disk > size only at I/O completion time to deal with various races including the > potential exposure of stale data. Indeed a little odd. But, I think delalloc/writeback implementation is possibly to blame here (based on what's detailed in 45d8ec4d9fd54)? Ideally, I had the same approach as XFS in mind, but I couldn't do it. --<M>--