On Mon 26-07-21 19:45:22, Andreas Grünbacher wrote: > Jan Kara <jack@xxxxxxx> schrieb am Mo., 26. Juli 2021, 19:21: > > > On Fri 23-07-21 22:58:38, Andreas Gruenbacher wrote: > > > In __iomap_dio_rw, when iomap_apply returns an -EFAULT error, complete > > the > > > request synchronously and reset the iterator to the start position. This > > > allows callers to deal with the failure and retry the operation. > > > > > > In gfs2, we need to disable page faults while we're holding glocks to > > prevent > > > deadlocks. This patch is the minimum solution I could find to make > > > iomap_dio_rw work with page faults disabled. It's still expensive > > because any > > > I/O that was carried out before hitting -EFAULT needs to be retried. > > > > > > A possible improvement would be to add an IOMAP_DIO_FAULT_RETRY or > > similar flag > > > that would allow iomap_dio_rw to return a short result when hitting > > -EFAULT. > > > Callers could then retry only the rest of the request after dealing with > > the > > > page fault. > > > > > > Asynchronous requests turn into synchronous requests up to the point of > > the > > > page fault in any case, but they could be retried asynchronously after > > dealing > > > with the page fault. To make that work, the completion notification > > would have > > > to include the bytes read or written before the page fault(s) as well, > > and we'd > > > need an additional iomap_dio_rw argument for that. > > > > > > Signed-off-by: Andreas Gruenbacher <agruenba@xxxxxxxxxx> > > > --- > > > fs/iomap/direct-io.c | 9 +++++++++ > > > 1 file changed, 9 insertions(+) > > > > > > diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c > > > index cc0b4bc8861b..b0a494211bb4 100644 > > > --- a/fs/iomap/direct-io.c > > > +++ b/fs/iomap/direct-io.c > > > @@ -561,6 +561,15 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter > > *iter, > > > ret = iomap_apply(inode, pos, count, iomap_flags, ops, dio, > > > iomap_dio_actor); > > > if (ret <= 0) { > > > + if (ret == -EFAULT) { > > > + /* > > > + * To allow retrying the request, fail > > > + * synchronously and reset the iterator. > > > + */ > > > + wait_for_completion = true; > > > + iov_iter_revert(dio->submit.iter, > > dio->size); > > > + } > > > + > > > > Hum, OK, but this means that if userspace submits large enough write, GFS2 > > will livelock trying to complete it? While other filesystems can just > > submit multiple smaller bios constructed in iomap_apply() (paging in > > different parts of the buffer) and thus complete the write? > > > > No. First, this affects reads but not writes. We cannot just blindly repeat > writes; when a page fault occurs in the middle of a write, the result will > be a short write. For reads, the plan is to ads a flag to allow > iomap_dio_rw to return a partial result when a page fault occurs. > (Currently, it fails the entire request.) Then we can handle the page fault > and complete the rest of the request. > > The changes needed for that are simple on the iomap side, but we need to go > through some gymnastics for handling the page fault without giving up the > glock in the non-contended case. There will still be the potential for > losing the lock and having to re-acquire it, in which case we'll actually > have to repeat the entire read. I've missed you've already sent out v4 (I'm catching up after vacation so my mailbox is a bit of a mess). What's in there addresses my objection. I'm sorry for the noise. Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR