On 11/18/20 1:37 PM, Dave Chinner wrote: > On Wed, Nov 18, 2020 at 08:26:50AM -0700, Jens Axboe wrote: >> On 11/18/20 12:19 AM, Dave Chinner wrote: >>> On Tue, Nov 17, 2020 at 03:17:18PM -0700, Jens Axboe wrote: >>>> If we've successfully transferred some data in __iomap_dio_rw(), >>>> don't mark an error for a latter segment in the dio. >>>> >>>> Signed-off-by: Jens Axboe <axboe@xxxxxxxxx> >>>> >>>> --- >>>> >>>> Debugging an issue with io_uring, which uses IOCB_NOWAIT for the >>>> IO. If we do parts of an IO, then once that completes, we still >>>> return -EAGAIN if we ran into a problem later on. That seems wrong, >>>> normal convention would be to return the short IO instead. For the >>>> -EAGAIN case, io_uring will retry later parts without IOCB_NOWAIT >>>> and complete it successfully. >>> >>> So you are getting a write IO that is split across an allocated >>> extent and a hole, and the second mapping is returning EAGAIN >>> because allocation would be required? This sort of split extent IO >>> is fairly common, so I'm not sure that splitting them into two >>> separate IOs may not be the best approach. >> >> The case I seem to be hitting is this one: >> >> if (iocb->ki_flags & IOCB_NOWAIT) { >> if (filemap_range_has_page(mapping, pos, end)) { >> ret = -EAGAIN; >> goto out_free_dio; >> } >> flags |= IOMAP_NOWAIT; >> } >> >> in __iomap_dio_rw(), which isn't something we can detect upfront like IO >> over a multiple extents... > > This specific situation cannot result in the partial IO behaviour > you described. It is an -upfront check- that is done before any IO > is mapped or issued so results in the entire IO being skipped and we > don't get anywhere near the code you changed. > > IOWs, this doesn't explain why you saw a partial IO, or why changing > partial IO return values avoids -EAGAIN from a range we apparently > just did a partial IO over and -didn't have page cache pages- > sitting over it. You are right, I double checked and recreated my debugging. What's triggering is that we're hitting this in xfs_direct_write_iomap_begin() after we've already done some IO: allocate_blocks: error = -EAGAIN; if (flags & IOMAP_NOWAIT) goto out_unlock; > Can you provide an actual event trace of the IOs in question that > are failing in your tests (e.g. from something like `trace-cmd > record -e xfs_file\* -e xfs_i\* -e xfs_\*write -e iomap\*` over the > sequential that reproduces the issue) so that there's no ambiguity > over how this problem is occurring in your systems? Let me know if you still want this! -- Jens Axboe