> On Sep 2, 2022, at 11:34 PM, Olga Kornievskaia <aglo@xxxxxxxxx> wrote: > > Hi folks, > > Ok so I won't be trying Ben's idea but I'm sort of confused is the > thought that NFS is somehow at fault in incorrectly using the "new" > code introduced by the new patches. Isn't it possible that the new > patches are wrong? Since the bisect result indicates one of Al's commits, that commit should be at the top of the suspect list. I believe the intent is that, generally speaking, both the folio and the iov_iter work are not supposed to require changes to API consumers like NFSD. > I haven't had time to try and revert the patch(es) > to see if that makes the oops go away. A test revert would be a good step to confirm the bisect result before asking for Al's opinion. Always dot your p's and cross your q's! > I won't get around to it until about tuesday with the holidays. Try it out when you can. Or if someone else has a chance before then, they can report back here. > On Fri, Sep 2, 2022 at 8:38 PM Benjamin Coddington <bcodding@xxxxxxxxxx> wrote: >> >> On 2 Sep 2022, at 17:13, Chuck Lever III wrote: >> >>>> On Sep 2, 2022, at 4:58 PM, Benjamin Coddington <bcodding@xxxxxxxxxx> >>>> wrote: >>>> >>>> Olga, does this fix it up for you? I'm testing now, but I think it >>>> might be >>>> a little harder for me to hit. >>>> >>>> Ben >>>> >>>> 8<------------------------------------------------ >>>> From 6bea39a887495b1748ff3b179d6e2f3d7e552b61 Mon Sep 17 00:00:00 >>>> 2001 >>>> From: Benjamin Coddington <bcodding@xxxxxxxxxx> >>>> Date: Fri, 2 Sep 2022 16:49:17 -0400 >>>> Subject: [PATCH] SUNRPC: Fix svc_tcp_sendmsg bvec offset calculation >>>> >>>> The xdr_buf's bvec member points to an array of struct bio_vec, let's >>>> fixup the calculation to the start of the bio_vec for non-zero >>>> page_base. >>>> >>>> Fixes: bad4c6eb5eaa ("SUNRPC: Fix NFS READs that start at >>>> non-page-aligned offsets") >>>> Signed-off-by: Benjamin Coddington <bcodding@xxxxxxxxxx> >>>> --- >>>> net/sunrpc/svcsock.c | 2 +- >>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>> >>>> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c >>>> index 2fc98fea59b4..ecafc9c4bc5c 100644 >>>> --- a/net/sunrpc/svcsock.c >>>> +++ b/net/sunrpc/svcsock.c >>>> @@ -1110,7 +1110,7 @@ static int svc_tcp_sendmsg(struct socket *sock, >>>> struct xdr_buf *xdr, >>>> unsigned int offset, len, remaining; >>>> struct bio_vec *bvec; >>>> >>>> - bvec = xdr->bvec + (xdr->page_base >> PAGE_SHIFT); >>>> + bvec = &xdr->bvec[xdr->page_base >> PAGE_SHIFT]; >>> >>> Color me skeptical. >>> >>> I'm not sure these two expressions are different. This variety >>> of pointer arithmetic is used throughout the XDR layer: >> >> Yeah, you know what - it did crash in the same place with this change. >> >> My thinking was that if you have (for example) page_base = 8192, and >> xdr->bvec of, say 0xffff4500, then what you want is to set the local >> bvec var >> to 0xfff4500 + sizeof(struct bio_vec)*2, but the code looks like it >> would >> set the local bvec to 0xffff4502, which is not the same thing.. >> >> There must be a hole in my head, I guess I need to dig out my K&R, >> sorry >> for the noise. I will figure it out. >> >>> net/sunrpc/xdr.c: pgto = pages + (pgto_base >> PAGE_SHIFT); >>> net/sunrpc/xdr.c: pgfrom = pages + (pgfrom_base >> PAGE_SHIFT); >>> net/sunrpc/xdr.c: pgto = pages + (pgto_base >> PAGE_SHIFT); >>> net/sunrpc/xdr.c: pgfrom = pages + (pgfrom_base >> PAGE_SHIFT); >>> net/sunrpc/xdr.c: pgto = pages + (pgbase >> PAGE_SHIFT); >>> net/sunrpc/xdr.c: pgfrom = pages + (pgbase >> PAGE_SHIFT); >>> net/sunrpc/xdr.c: page = pages + (pgbase >> PAGE_SHIFT); >>> net/sunrpc/xdr.c: xdr->page_ptr = buf->pages + (new >> >>> PAGE_SHIFT); >>> net/sunrpc/xdr.c: ppages = buf->pages + (base >> >>> PAGE_SHIFT); >>> net/sunrpc/xprtrdma/rpc_rdma.c: ppages = buf->pages + (buf->page_base >>>>> PAGE_SHIFT); >>> net/sunrpc/xprtrdma/rpc_rdma.c: ppages = xdrbuf->pages + >>> (xdrbuf->page_base >> PAGE_SHIFT); >>> net/sunrpc/xprtrdma/rpc_rdma.c: ppages = xdr->pages + (xdr->page_base >>>>> PAGE_SHIFT); >>> net/sunrpc/xprtrdma/rpc_rdma.c: ppages = xdr->pages + (xdr->page_base >>>>> PAGE_SHIFT); >> >> Hmm.. there's clearly something wrong with me. >> >>> Commit bad4c6eb5eaa is from v5.11. Wouldn't this issue have >>> shown up in earlier kernels? At the very least, the patch >>> description needs to explain why this computation is not a >>> problem for kernels 5.11 through 5.19. >> >> I totally agree. I figured it was rare to have a non-zero page_base, >> and >> maybe a client change is now creating that. >> >> Ben >> -- Chuck Lever