Re: [PATCH v2 5/6] ext4: introduce direct IO write path using iomap infrastructure

Matthew Bobrowski <mbobrowski@xxxxxxxxxxxxxx> · Tue, 10 Sep 2019 21:20:14 +1000

On Mon, Sep 09, 2019 at 08:02:13PM +0530, Ritesh Harjani wrote:
> On 9/9/19 2:56 PM, Ritesh Harjani wrote:
> > > +    ret = iomap_dio_rw(iocb, from, &ext4_iomap_ops,
> > > ext4_dio_write_end_io);
> > > +
> > > +    /*
> > > +     * Unaligned direct AIO must be the only IO in flight or else
> > > +     * any overlapping aligned IO after unaligned IO might result
> > > +     * in data corruption. We also need to wait here in the case
> > > +     * where the inode is being extended so that inode extension
> > > +     * routines in ext4_dio_write_end_io() are covered by the
> > > +     * inode_lock().
> > > +     */
> > > +    if (ret == -EIOCBQUEUED && (unaligned_aio || extend))
> > > +        inode_dio_wait(inode);
> 
> So, I looked this more closely into the AIO DIO write extend case
> of yours here. AFAICT, this looks good in the sense that it follows
> the behavior what we used to have before from __blockdev_direct_IO.
> In that it used to wait for AIO DIO writes beyond EOF, but the iomap
> framework does not have that. So waiting in case of writes beyond EOF
> should be the right thing to do here for ext4 (following the legacy code).
> 
> But I would like to confirm the exact race this extend case
> is protecting here.
> Since writes beyond EOF will require update of inode i_size
> (ext4_update_inode_size()) which require us to hold the inode_lock
> in exclusive mode, so we must need to wait in extend case here,
> even for AIO DIO writes.
> 
> Q1. Is above understanding completely correct?

Yes, that's essentially correct.

> Q2. Or is there anything else also which it is also protecting which I am
> missing?

No, I think that's it...

> Do we need to hold inode exclusive lock for ext4_orphan_del() as well?

Yes, I believe so.

> Q3. How about XFS then?
> (I do see some tricks done with IOLOCK handling in case of ki__pos > i_size
> & to zero out the buffer space between old i_size & ki_pos).
> 
> But if we talk only about the above case of extending AIO DIO writes beyond
> EOF, XFS only takes a shared lock. why?
> 
> Looking into XFS code, I see that they have IOLOCK & ILOCK.
> So I guess for protecting inode->i_size update they use ILOCK (in
> xfs_dio_write_end_io() -> xfs_iomap_write_unwritten()
> or ip->i_flags_lock lock in NON-UNWRITTEN case). And for IO part the IOLOCK
> is used and hence IOLOCK can be used in shared mode. Is this correct
> understanding for XFS?

* David/Christoph/Darrick

I haven't looked at the intricate XFS locking semantics, so I can't really
comment until I've looked at the code to be perfectly honest. Perhaps asking
one of the XFS maintainers could get you the answer you're looking for on
this.

--<M>--