> On Sep 10, 2022, at 5:14 PM, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote: > > On Wed, Sep 07, 2022 at 08:52:46AM -0400, Benjamin Coddington wrote: >> On 7 Sep 2022, at 0:58, Chuck Lever III wrote: >> >>>> On Sep 6, 2022, at 3:12 PM, Olga Kornievskaia <aglo@xxxxxxxxx> wrote: >>>> >>>> On Tue, Sep 6, 2022 at 2:28 PM Benjamin Coddington >>>> <bcodding@xxxxxxxxxx> wrote: >>>>> >>>>> On 1 Sep 2022, at 21:27, Olga Kornievskaia wrote: >>>>> >>>>>> Thanks Chuck. I first, based on a hunch, narrowed down that it's >>>>>> coming from Al Viro's merge commit. Then I git bisected his >>>>>> 32patches >>>>>> to the following commit f0f6b614f83dbae99d283b7b12ab5dd2e04df979 >>>>> >>>>> No crash for me after reverting >>>>> f0f6b614f83dbae99d283b7b12ab5dd2e04df979. >>>> >>>> I second that. No crash after a revert here. >>> >>> I bisected the new xfstests failures to the same commit: >>> >>> f0f6b614f83dbae99d283b7b12ab5dd2e04df979 is the first bad commit >>> commit f0f6b614f83dbae99d283b7b12ab5dd2e04df979 >>> Author: Al Viro <viro@xxxxxxxxxxxxxxxxxx> >>> Date: Thu Jun 23 17:21:37 2022 -0400 >>> >>> copy_page_to_iter(): don't split high-order page in case of >>> ITER_PIPE >>> >>> ... just shove it into one pipe_buffer. >>> >>> Signed-off-by: Al Viro <viro@xxxxxxxxxxxxxxxxxx> >>> >>> lib/iov_iter.c | 21 ++++++--------------- >>> 1 file changed, 6 insertions(+), 15 deletions(-) >>> >> >> I've been reliably reproducing this on generic/551 on xfs. In the case >> where it crashes, rqstp->rq_res.page_base is positive multiple of PAGE_SIZE >> after getting set in nfsd_splice_actor(), and that with page_len overruns >> the 256 pages we have. >> >> With f0f6b614f83d reverted, rqstp->rq_res.page_base is always zero. >> >> After 47b7fcae419dc and f0f6b614f83d, buf->offset in nfsd_splice_actor can >> be a positive multiple of PAGE_SIZE, whereas before it was always just the >> offset into the page. >> >> Something like this might fix it up: >> >> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c >> index 9f486b788ed0..d62963f36f03 100644 >> --- a/fs/nfsd/vfs.c >> +++ b/fs/nfsd/vfs.c >> @@ -849,7 +849,7 @@ nfsd_splice_actor(struct pipe_inode_info *pipe, struct >> pipe_buffer *buf, >> >> svc_rqst_replace_page(rqstp, buf->page); >> if (rqstp->rq_res.page_len == 0) >> - rqstp->rq_res.page_base = buf->offset; >> + rqstp->rq_res.page_base = buf->offset % PAGE_SIZE; >> rqstp->rq_res.page_len += sd->len; >> return sd->len; >> } >> >> .. but we should check with Al about whether this needs to be fixed over in >> copy_page_to_iter_pipe(), since I don't think pipe_buffer offset should be >> greater than PAGE_SIZE. >> >> Al, any thoughts? > > pipe_buffer offsets in general can be greater than PAGE_SIZE. What's more, > buffer *size* can be greater than PAGE_SIZE - it really can contain more > than PAGE_SIZE worth of data. In that case the page is a compound one, of > course. > > Regression is the combination of "splice from regular file to pipe hadn't > produced that earlier, now it does" and "nfsd never needed to handle that". > > I don't believe that fix is correct. *IF* you can't deal with compound > page in sunrpc, you need a loop going by subpages in nfsd_splice_actor(), > similar to one that used to be in copy_page_to_iter(). Could you try > the following: > > nfsd_splice_actor(): handle compound pages > > pipe_buffer might refer to a compound page (and contain more than a PAGE_SIZE > worth of data). Theoretically it had been possible since way back, but > nfsd_splice_actor() hadn't run into that until copy_page_to_iter() change. > Fortunately, the only thing that changes for compound pages is that we > need to stuff each relevant subpage in and convert the offset into offset > in the first subpage. > > Hopefully-fixes: f0f6b614f83d "copy_page_to_iter(): don't split high-order page in case of ITER_PIPE" > Signed-off-by: Al Viro <viro@xxxxxxxxxxxxxxxxxx> > --- I'm getting my head around this, just a couple of comments so far. > diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c > index 9f486b788ed0..b16aed158ba6 100644 > --- a/fs/nfsd/vfs.c > +++ b/fs/nfsd/vfs.c > @@ -846,10 +846,14 @@ nfsd_splice_actor(struct pipe_inode_info *pipe, struct pipe_buffer *buf, > struct splice_desc *sd) > { > struct svc_rqst *rqstp = sd->u.data; > - > - svc_rqst_replace_page(rqstp, buf->page); > - if (rqstp->rq_res.page_len == 0) > - rqstp->rq_res.page_base = buf->offset; > + struct page *page = buf->page; // may be a compound one > + unsigned offset = buf->offset; > + > + page += offset / PAGE_SIZE; Nit: I see "offset / PAGE_SIZE" is used in the iter code base, but in the NFS stack, we prefer "offset >> PAGE_SIZE" and "offset & ~PAGE_MASK" (below). > + for (int i = sd->len; i > 0; i -= PAGE_SIZE) > + svc_rqst_replace_page(rqstp, page++); > + if (rqstp->rq_res.page_len == 0) // first call > + rqstp->rq_res.page_base = offset % PAGE_SIZE; > rqstp->rq_res.page_len += sd->len; > return sd->len; > } I could take this through the nfsd for-rc tree, but that's based on 5.19-rc7 so it doesn't have f0f6b614f83d. I don't think will break functionality, but I'm wondering if it would be better for you to take this through your tree to improve bisect-ability. If you agree and Ben reports a Tested-by:, then here's my Acked-by: Chuck Lever <chuck.lever@xxxxxxxxxx> It might be nice one day to have a single mechanism for NFSD to handle READs. I don't think a pipe is going to work for our upcoming hole-detection scheme, for example. -- Chuck Lever