On 11/18/20 2:33 PM, Dave Chinner wrote: > On Wed, Nov 18, 2020 at 02:19:30PM -0700, Jens Axboe wrote: >> On 11/18/20 2:15 PM, Dave Chinner wrote: >>> On Wed, Nov 18, 2020 at 02:00:06PM -0700, Jens Axboe wrote: >>>> On 11/18/20 1:37 PM, Dave Chinner wrote: >>>>> On Wed, Nov 18, 2020 at 08:26:50AM -0700, Jens Axboe wrote: >>>>>> On 11/18/20 12:19 AM, Dave Chinner wrote: >>>>>>> On Tue, Nov 17, 2020 at 03:17:18PM -0700, Jens Axboe wrote: >>>>>>>> If we've successfully transferred some data in __iomap_dio_rw(), >>>>>>>> don't mark an error for a latter segment in the dio. >>>>>>>> >>>>>>>> Signed-off-by: Jens Axboe <axboe@xxxxxxxxx> >>>>>>>> >>>>>>>> --- >>>>>>>> >>>>>>>> Debugging an issue with io_uring, which uses IOCB_NOWAIT for the >>>>>>>> IO. If we do parts of an IO, then once that completes, we still >>>>>>>> return -EAGAIN if we ran into a problem later on. That seems wrong, >>>>>>>> normal convention would be to return the short IO instead. For the >>>>>>>> -EAGAIN case, io_uring will retry later parts without IOCB_NOWAIT >>>>>>>> and complete it successfully. >>>>>>> >>>>>>> So you are getting a write IO that is split across an allocated >>>>>>> extent and a hole, and the second mapping is returning EAGAIN >>>>>>> because allocation would be required? This sort of split extent IO >>>>>>> is fairly common, so I'm not sure that splitting them into two >>>>>>> separate IOs may not be the best approach. >>>>>> >>>>>> The case I seem to be hitting is this one: >>>>>> >>>>>> if (iocb->ki_flags & IOCB_NOWAIT) { >>>>>> if (filemap_range_has_page(mapping, pos, end)) { >>>>>> ret = -EAGAIN; >>>>>> goto out_free_dio; >>>>>> } >>>>>> flags |= IOMAP_NOWAIT; >>>>>> } >>>>>> >>>>>> in __iomap_dio_rw(), which isn't something we can detect upfront like IO >>>>>> over a multiple extents... >>>>> >>>>> This specific situation cannot result in the partial IO behaviour >>>>> you described. It is an -upfront check- that is done before any IO >>>>> is mapped or issued so results in the entire IO being skipped and we >>>>> don't get anywhere near the code you changed. >>>>> >>>>> IOWs, this doesn't explain why you saw a partial IO, or why changing >>>>> partial IO return values avoids -EAGAIN from a range we apparently >>>>> just did a partial IO over and -didn't have page cache pages- >>>>> sitting over it. >>>> >>>> You are right, I double checked and recreated my debugging. What's >>>> triggering is that we're hitting this in xfs_direct_write_iomap_begin() >>>> after we've already done some IO: >>>> >>>> allocate_blocks: >>>> error = -EAGAIN; >>>> if (flags & IOMAP_NOWAIT) >>>> goto out_unlock; >>> >>> Ok, that's exactly the case the reproducer I wrote triggers. >> >> OK good, then we're on the same page :-) >> >>>>> Can you provide an actual event trace of the IOs in question that >>>>> are failing in your tests (e.g. from something like `trace-cmd >>>>> record -e xfs_file\* -e xfs_i\* -e xfs_\*write -e iomap\*` over the >>>>> sequential that reproduces the issue) so that there's no ambiguity >>>>> over how this problem is occurring in your systems? >>>> >>>> Let me know if you still want this! >>> >>> No, it makes sense now :) >> >> What's the next step here? Are you working on an XFS fix for this? > > I'm just building the patch now for testing. Nice, you're fast... >> Was looking at other potential -EAGAIN during the loop, and seems like >> we'd be able to hit this if we fail xfs_ilock_for_iomap() as well. And >> not sure how that would be solvable, without accepting that IOCB_NOWAIT >> reads/writes can be short. > > The change I'm making should solves that, too. i.e. NOWAIT IO must > map entirely within a single extent, so there is no scope for a > short IO and re-entering the mapping code under NOWAIT conditions > that could then fail. Perfect, thanks Dave! -- Jens Axboe