Re: Is this nfsd kernel oops known?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> On Sep 2, 2022, at 11:34 PM, Olga Kornievskaia <aglo@xxxxxxxxx> wrote:
> 
> Hi folks,
> 
> Ok so I won't be trying Ben's idea but I'm sort of confused is the
> thought that NFS is somehow at fault in incorrectly using the "new"
> code introduced by the new patches. Isn't it possible that the new
> patches are wrong?

Since the bisect result indicates one of Al's commits, that commit
should be at the top of the suspect list. I believe the intent is
that, generally speaking, both the folio and the iov_iter work are
not supposed to require changes to API consumers like NFSD.


> I haven't had time to try and revert the patch(es)
> to see if that makes the oops go away.

A test revert would be a good step to confirm the bisect result
before asking for Al's opinion. Always dot your p's and cross your
q's!


> I won't get around to it until about tuesday with the holidays.

Try it out when you can. Or if someone else has a chance before
then, they can report back here.


> On Fri, Sep 2, 2022 at 8:38 PM Benjamin Coddington <bcodding@xxxxxxxxxx> wrote:
>> 
>> On 2 Sep 2022, at 17:13, Chuck Lever III wrote:
>> 
>>>> On Sep 2, 2022, at 4:58 PM, Benjamin Coddington <bcodding@xxxxxxxxxx>
>>>> wrote:
>>>> 
>>>> Olga, does this fix it up for you?  I'm testing now, but I think it
>>>> might be
>>>> a little harder for me to hit.
>>>> 
>>>> Ben
>>>> 
>>>> 8<------------------------------------------------
>>>> From 6bea39a887495b1748ff3b179d6e2f3d7e552b61 Mon Sep 17 00:00:00
>>>> 2001
>>>> From: Benjamin Coddington <bcodding@xxxxxxxxxx>
>>>> Date: Fri, 2 Sep 2022 16:49:17 -0400
>>>> Subject: [PATCH] SUNRPC: Fix svc_tcp_sendmsg bvec offset calculation
>>>> 
>>>> The xdr_buf's bvec member points to an array of struct bio_vec, let's
>>>> fixup the calculation to the start of the bio_vec for non-zero
>>>> page_base.
>>>> 
>>>> Fixes: bad4c6eb5eaa ("SUNRPC: Fix NFS READs that start at
>>>> non-page-aligned offsets")
>>>> Signed-off-by: Benjamin Coddington <bcodding@xxxxxxxxxx>
>>>> ---
>>>> net/sunrpc/svcsock.c | 2 +-
>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>> 
>>>> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
>>>> index 2fc98fea59b4..ecafc9c4bc5c 100644
>>>> --- a/net/sunrpc/svcsock.c
>>>> +++ b/net/sunrpc/svcsock.c
>>>> @@ -1110,7 +1110,7 @@ static int svc_tcp_sendmsg(struct socket *sock,
>>>> struct xdr_buf *xdr,
>>>>               unsigned int offset, len, remaining;
>>>>               struct bio_vec *bvec;
>>>> 
>>>> -               bvec = xdr->bvec + (xdr->page_base >> PAGE_SHIFT);
>>>> +               bvec = &xdr->bvec[xdr->page_base >> PAGE_SHIFT];
>>> 
>>> Color me skeptical.
>>> 
>>> I'm not sure these two expressions are different. This variety
>>> of pointer arithmetic is used throughout the XDR layer:
>> 
>> Yeah, you know what - it did crash in the same place with this change.
>> 
>> My thinking was that if you have (for example) page_base = 8192, and
>> xdr->bvec of, say 0xffff4500, then what you want is to set the local
>> bvec var
>> to 0xfff4500 + sizeof(struct bio_vec)*2, but the code looks like it
>> would
>> set the local bvec to 0xffff4502, which is not the same thing..
>> 
>> There must be a hole in my head,  I guess I need to dig out my K&R,
>> sorry
>> for the noise.  I will figure it out.
>> 
>>> net/sunrpc/xdr.c:       pgto = pages + (pgto_base >> PAGE_SHIFT);
>>> net/sunrpc/xdr.c:       pgfrom = pages + (pgfrom_base >> PAGE_SHIFT);
>>> net/sunrpc/xdr.c:       pgto = pages + (pgto_base >> PAGE_SHIFT);
>>> net/sunrpc/xdr.c:       pgfrom = pages + (pgfrom_base >> PAGE_SHIFT);
>>> net/sunrpc/xdr.c:       pgto = pages + (pgbase >> PAGE_SHIFT);
>>> net/sunrpc/xdr.c:       pgfrom = pages + (pgbase >> PAGE_SHIFT);
>>> net/sunrpc/xdr.c:       page = pages + (pgbase >> PAGE_SHIFT);
>>> net/sunrpc/xdr.c:       xdr->page_ptr = buf->pages + (new >>
>>> PAGE_SHIFT);
>>> net/sunrpc/xdr.c:               ppages = buf->pages + (base >>
>>> PAGE_SHIFT);
>>> net/sunrpc/xprtrdma/rpc_rdma.c: ppages = buf->pages + (buf->page_base
>>>>> PAGE_SHIFT);
>>> net/sunrpc/xprtrdma/rpc_rdma.c: ppages = xdrbuf->pages +
>>> (xdrbuf->page_base >> PAGE_SHIFT);
>>> net/sunrpc/xprtrdma/rpc_rdma.c: ppages = xdr->pages + (xdr->page_base
>>>>> PAGE_SHIFT);
>>> net/sunrpc/xprtrdma/rpc_rdma.c: ppages = xdr->pages + (xdr->page_base
>>>>> PAGE_SHIFT);
>> 
>> Hmm.. there's clearly something wrong with me.
>> 
>>> Commit bad4c6eb5eaa is from v5.11. Wouldn't this issue have
>>> shown up in earlier kernels? At the very least, the patch
>>> description needs to explain why this computation is not a
>>> problem for kernels 5.11 through 5.19.
>> 
>> I totally agree.  I figured it was rare to have a non-zero page_base,
>> and
>> maybe a client change is now creating that.
>> 
>> Ben
>> 

--
Chuck Lever







[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux