On Thu, Nov 19, 2020 at 11:19:05AM -0500, Chuck Lever wrote: > > On Nov 19, 2020, at 10:09 AM, Olga Kornievskaia <olga.kornievskaia@xxxxxxxxx> wrote: > > > > On Thu, Nov 19, 2020 at 9:37 AM Chuck Lever <chuck.lever@xxxxxxxxxx> wrote: > >> > >> Hi Olga- > >> > >>> On Nov 18, 2020, at 4:44 PM, Olga Kornievskaia <olga.kornievskaia@xxxxxxxxx> wrote: > >>> > >>> Hi Chuck, > >>> > >>> The first problem I found was from 5.10-rc3 testing was from the fact > >>> that tail's iov_len was set to -4 and reply_chunk was trying to call > >>> rpcrdma_convert_kvec() for a tail that didn't exist. > >>> > >>> Do you see issues with this fix? > >>> > >>> diff --git a/net/sunrpc/xdr.c b/net/sunrpc/xdr.c > >>> index 71e03b930b70..2e6a228abb95 100644 > >>> --- a/net/sunrpc/xdr.c > >>> +++ b/net/sunrpc/xdr.c > >>> @@ -193,7 +193,7 @@ xdr_inline_pages(struct xdr_buf *xdr, unsigned int offset, > >>> > >>> tail->iov_base = buf + offset; > >>> tail->iov_len = buflen - offset; > >>> - if ((xdr->page_len & 3) == 0) > >>> + if ((xdr->page_len & 3) == 0 && tail->iov_len) > >>> tail->iov_len -= sizeof(__be32); > >>> > >>> xdr->buflen += len; > >> > >> It's not clear to me whether only the listxattrs encoder is > >> not providing a receive tail kvec, or whether all the encoders > >> fail to provide a tail in this case. > > > > There is nothing specific that listxattr does, it just calls > > rpc_prepare_pages(). Isn't tail[0] always there (xdr_buf structure has > > tail[1] defined)? > > Flip the question on its head: Why does xdr_inline_pages() work > fine for all the other operations? That suggests the problem is > with listxattrs, not with generic code. > > > > But not all requests have data in the page. So as > > far as I understand, tail->iov_len can be 0 so not checking for it is > > wrong. > > The full context of the tail logic is: > > 194 tail->iov_base = buf + offset; > 195 tail->iov_len = buflen - offset; > 196 if ((xdr->page_len & 3) == 0) > 197 tail->iov_len -= sizeof(__be32); > > tail->iov_len is set to buflen - offset right here. It should > /always/ be 4 or more. We can ensure that because the input > to this function is always provided by another kernel function > (in other words, we control all the callers). > > Why is buflen == offset for listxattrs? 6c2190b3fcbc ("NFS: > Fix listxattr receive buffer size") is trying to ensure > tail->iov_len is not zero -- that the math here gives us a > positive value every time. > > In nfs4_xdr_enc_listxattrs() we have: > > 1529 rpc_prepare_reply_pages(req, args->xattr_pages, 0, args->count, > 1530 hdr.replen); > > hdr.replen is set to NFS4_dec_listxattrs_sz. > > _nfs42_proc_listxattrs() sets args->count. > > I suspect the problem is the way _nfs42_proc_listxattrs() is > computing the length of xattr_pages. It includes the trailing > EOF boolean, but so does the decode_listxattrs_maxsz macro, > for instance. > > We need head->iov_len to always be one XDR_UNIT larger than > the position where the xattr_pages array starts. Then the > math in xdr_inline_pages() will work correctly. (sidebar: > perhaps the documenting comment for xdr_inline_pages() should > explain that assumption). > > So, now I agree with the assessment that 6c2190b3fcbc ("NFS: > Fix listxattr receive buffer size") is probably not adequate. > There is another subtlety to address in the way that > _nfs42_proc_listxattrs() computes args->count. The only thing I see wrong so far is that I believe that decode_listxattrs_maxsz is wrong - it shouldn't include the EOF word, which is accounted for in the page data part. But, it seems that this wouldn't cause the above problem. I'm also uncertain why this happens with RDMA, but not otherwise. Unfortunately, I can't test RDMA, but when I ran listxattr tests, I did so with KASAN enabled, and didn't see issues. Obviously there could be a bug here, it could be that the code just gets lucky, but that the bug is exposed on RDMA. Is there a specific size passed to listxattr that this happens with? - Frank