On Mon, Sep 09, 2019 at 08:02:13PM +0530, Ritesh Harjani wrote: > On 9/9/19 2:56 PM, Ritesh Harjani wrote: > > > + ret = iomap_dio_rw(iocb, from, &ext4_iomap_ops, > > > ext4_dio_write_end_io); > > > + > > > + /* > > > + * Unaligned direct AIO must be the only IO in flight or else > > > + * any overlapping aligned IO after unaligned IO might result > > > + * in data corruption. We also need to wait here in the case > > > + * where the inode is being extended so that inode extension > > > + * routines in ext4_dio_write_end_io() are covered by the > > > + * inode_lock(). > > > + */ > > > + if (ret == -EIOCBQUEUED && (unaligned_aio || extend)) > > > + inode_dio_wait(inode); > > So, I looked this more closely into the AIO DIO write extend case > of yours here. AFAICT, this looks good in the sense that it follows > the behavior what we used to have before from __blockdev_direct_IO. > In that it used to wait for AIO DIO writes beyond EOF, but the iomap > framework does not have that. So waiting in case of writes beyond EOF > should be the right thing to do here for ext4 (following the legacy code). > > But I would like to confirm the exact race this extend case > is protecting here. > Since writes beyond EOF will require update of inode i_size > (ext4_update_inode_size()) which require us to hold the inode_lock > in exclusive mode, so we must need to wait in extend case here, > even for AIO DIO writes. > > Q1. Is above understanding completely correct? Yes, that's essentially correct. > Q2. Or is there anything else also which it is also protecting which I am > missing? No, I think that's it... > Do we need to hold inode exclusive lock for ext4_orphan_del() as well? Yes, I believe so. > Q3. How about XFS then? > (I do see some tricks done with IOLOCK handling in case of ki__pos > i_size > & to zero out the buffer space between old i_size & ki_pos). > > But if we talk only about the above case of extending AIO DIO writes beyond > EOF, XFS only takes a shared lock. why? > > Looking into XFS code, I see that they have IOLOCK & ILOCK. > So I guess for protecting inode->i_size update they use ILOCK (in > xfs_dio_write_end_io() -> xfs_iomap_write_unwritten() > or ip->i_flags_lock lock in NON-UNWRITTEN case). And for IO part the IOLOCK > is used and hence IOLOCK can be used in shared mode. Is this correct > understanding for XFS? * David/Christoph/Darrick I haven't looked at the intricate XFS locking semantics, so I can't really comment until I've looked at the code to be perfectly honest. Perhaps asking one of the XFS maintainers could get you the answer you're looking for on this. --<M>--