Re: Is this nfsd kernel oops known?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi folks,

Ok so I won't be trying Ben's idea but I'm sort of confused is the
thought that NFS is somehow at fault in incorrectly using the "new"
code introduced by the new patches. Isn't it possible that the new
patches are wrong? I haven't had time to try and revert the patch(es)
to see if that makes the oops go away. I won't get around to it until
about tuesday with the holidays.

On Fri, Sep 2, 2022 at 8:38 PM Benjamin Coddington <bcodding@xxxxxxxxxx> wrote:
>
> On 2 Sep 2022, at 17:13, Chuck Lever III wrote:
>
> >> On Sep 2, 2022, at 4:58 PM, Benjamin Coddington <bcodding@xxxxxxxxxx>
> >> wrote:
> >>
> >> Olga, does this fix it up for you?  I'm testing now, but I think it
> >> might be
> >> a little harder for me to hit.
> >>
> >> Ben
> >>
> >> 8<------------------------------------------------
> >> From 6bea39a887495b1748ff3b179d6e2f3d7e552b61 Mon Sep 17 00:00:00
> >> 2001
> >> From: Benjamin Coddington <bcodding@xxxxxxxxxx>
> >> Date: Fri, 2 Sep 2022 16:49:17 -0400
> >> Subject: [PATCH] SUNRPC: Fix svc_tcp_sendmsg bvec offset calculation
> >>
> >> The xdr_buf's bvec member points to an array of struct bio_vec, let's
> >> fixup the calculation to the start of the bio_vec for non-zero
> >> page_base.
> >>
> >> Fixes: bad4c6eb5eaa ("SUNRPC: Fix NFS READs that start at
> >> non-page-aligned offsets")
> >> Signed-off-by: Benjamin Coddington <bcodding@xxxxxxxxxx>
> >> ---
> >> net/sunrpc/svcsock.c | 2 +-
> >> 1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> >> index 2fc98fea59b4..ecafc9c4bc5c 100644
> >> --- a/net/sunrpc/svcsock.c
> >> +++ b/net/sunrpc/svcsock.c
> >> @@ -1110,7 +1110,7 @@ static int svc_tcp_sendmsg(struct socket *sock,
> >> struct xdr_buf *xdr,
> >>                unsigned int offset, len, remaining;
> >>                struct bio_vec *bvec;
> >>
> >> -               bvec = xdr->bvec + (xdr->page_base >> PAGE_SHIFT);
> >> +               bvec = &xdr->bvec[xdr->page_base >> PAGE_SHIFT];
> >
> > Color me skeptical.
> >
> > I'm not sure these two expressions are different. This variety
> > of pointer arithmetic is used throughout the XDR layer:
>
> Yeah, you know what - it did crash in the same place with this change.
>
> My thinking was that if you have (for example) page_base = 8192, and
> xdr->bvec of, say 0xffff4500, then what you want is to set the local
> bvec var
> to 0xfff4500 + sizeof(struct bio_vec)*2, but the code looks like it
> would
> set the local bvec to 0xffff4502, which is not the same thing..
>
> There must be a hole in my head,  I guess I need to dig out my K&R,
> sorry
> for the noise.  I will figure it out.
>
> > net/sunrpc/xdr.c:       pgto = pages + (pgto_base >> PAGE_SHIFT);
> > net/sunrpc/xdr.c:       pgfrom = pages + (pgfrom_base >> PAGE_SHIFT);
> > net/sunrpc/xdr.c:       pgto = pages + (pgto_base >> PAGE_SHIFT);
> > net/sunrpc/xdr.c:       pgfrom = pages + (pgfrom_base >> PAGE_SHIFT);
> > net/sunrpc/xdr.c:       pgto = pages + (pgbase >> PAGE_SHIFT);
> > net/sunrpc/xdr.c:       pgfrom = pages + (pgbase >> PAGE_SHIFT);
> > net/sunrpc/xdr.c:       page = pages + (pgbase >> PAGE_SHIFT);
> > net/sunrpc/xdr.c:       xdr->page_ptr = buf->pages + (new >>
> > PAGE_SHIFT);
> > net/sunrpc/xdr.c:               ppages = buf->pages + (base >>
> > PAGE_SHIFT);
> > net/sunrpc/xprtrdma/rpc_rdma.c: ppages = buf->pages + (buf->page_base
> > >> PAGE_SHIFT);
> > net/sunrpc/xprtrdma/rpc_rdma.c: ppages = xdrbuf->pages +
> > (xdrbuf->page_base >> PAGE_SHIFT);
> > net/sunrpc/xprtrdma/rpc_rdma.c: ppages = xdr->pages + (xdr->page_base
> > >> PAGE_SHIFT);
> > net/sunrpc/xprtrdma/rpc_rdma.c: ppages = xdr->pages + (xdr->page_base
> > >> PAGE_SHIFT);
>
> Hmm.. there's clearly something wrong with me.
>
> > Commit bad4c6eb5eaa is from v5.11. Wouldn't this issue have
> > shown up in earlier kernels? At the very least, the patch
> > description needs to explain why this computation is not a
> > problem for kernels 5.11 through 5.19.
>
> I totally agree.  I figured it was rare to have a non-zero page_base,
> and
> maybe a client change is now creating that.
>
> Ben
>



[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux