Re: [PATCH RFC] iomap: only return IO error if no data has been transferred

Dave Chinner <david@xxxxxxxxxxxxx> · Thu, 19 Nov 2020 08:15:06 +1100

On Wed, Nov 18, 2020 at 02:00:06PM -0700, Jens Axboe wrote:
> On 11/18/20 1:37 PM, Dave Chinner wrote:
> > On Wed, Nov 18, 2020 at 08:26:50AM -0700, Jens Axboe wrote:
> >> On 11/18/20 12:19 AM, Dave Chinner wrote:
> >>> On Tue, Nov 17, 2020 at 03:17:18PM -0700, Jens Axboe wrote:
> >>>> If we've successfully transferred some data in __iomap_dio_rw(),
> >>>> don't mark an error for a latter segment in the dio.
> >>>>
> >>>> Signed-off-by: Jens Axboe <axboe@xxxxxxxxx>
> >>>>
> >>>> ---
> >>>>
> >>>> Debugging an issue with io_uring, which uses IOCB_NOWAIT for the
> >>>> IO. If we do parts of an IO, then once that completes, we still
> >>>> return -EAGAIN if we ran into a problem later on. That seems wrong,
> >>>> normal convention would be to return the short IO instead. For the
> >>>> -EAGAIN case, io_uring will retry later parts without IOCB_NOWAIT
> >>>> and complete it successfully.
> >>>
> >>> So you are getting a write IO that is split across an allocated
> >>> extent and a hole, and the second mapping is returning EAGAIN
> >>> because allocation would be required? This sort of split extent IO
> >>> is fairly common, so I'm not sure that splitting them into two
> >>> separate IOs may not be the best approach.
> >>
> >> The case I seem to be hitting is this one:
> >>
> >> if (iocb->ki_flags & IOCB_NOWAIT) {
> >> 	if (filemap_range_has_page(mapping, pos, end)) {
> >>                   ret = -EAGAIN;
> >>                   goto out_free_dio;
> >> 	}
> >> 	flags |= IOMAP_NOWAIT;
> >> }
> >>
> >> in __iomap_dio_rw(), which isn't something we can detect upfront like IO
> >> over a multiple extents...
> > 
> > This specific situation cannot result in the partial IO behaviour
> > you described.  It is an -upfront check- that is done before any IO
> > is mapped or issued so results in the entire IO being skipped and we
> > don't get anywhere near the code you changed.
> > 
> > IOWs, this doesn't explain why you saw a partial IO, or why changing
> > partial IO return values avoids -EAGAIN from a range we apparently
> > just did a partial IO over and -didn't have page cache pages-
> > sitting over it.
> 
> You are right, I double checked and recreated my debugging. What's
> triggering is that we're hitting this in xfs_direct_write_iomap_begin()
> after we've already done some IO:
> 
> allocate_blocks:
> 	error = -EAGAIN;
> 	if (flags & IOMAP_NOWAIT)
> 		goto out_unlock;

Ok, that's exactly the case the reproducer I wrote triggers.

> > Can you provide an actual event trace of the IOs in question that
> > are failing in your tests (e.g. from something like `trace-cmd
> > record -e xfs_file\* -e xfs_i\* -e xfs_\*write -e iomap\*` over the
> > sequential that reproduces the issue) so that there's no ambiguity
> > over how this problem is occurring in your systems?
> 
> Let me know if you still want this!

No, it makes sense now :)

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx