> On Nov 15, 2019, at 12:31 PM, Benjamin Coddington <bcodding@xxxxxxxxxx> wrote: > > On 15 Nov 2019, at 10:51, Chuck Lever wrote: > >>> On Nov 15, 2019, at 9:35 AM, Benjamin Coddington <bcodding@xxxxxxxxxx> wrote: >>> >>> On 15 Nov 2019, at 8:39, Chuck Lever wrote: >>> >>>> xdr_shrink_pagelen() BUG's when @len is larger than buf->page_len. >>>> This can happen when xdr_buf_read_mic() is given an xdr_buf with >>>> a small page array (like, only a few bytes). >>> >>> Hi Chuck, >>> >>> Seems like a bug in xdr_buf_read_mic to me, but I'm not seeing how this can >>> happen.. unless perhaps xdr->page_len is 0? Or maybe xdr_shift_buf has bug? >>> >>> I'd prefer to keep the BUG_ON. How can I reproduce it? >>> >>> diff --git a/net/sunrpc/xdr.c b/net/sunrpc/xdr.c >>> index 14ba9e72a204..71d754fc780e 100644 >>> --- a/net/sunrpc/xdr.c >>> +++ b/net/sunrpc/xdr.c >>> @@ -1262,6 +1262,8 @@ int xdr_buf_read_mic(struct xdr_buf *buf, struct xdr_netobj *mic, unsigned int o >>> if (offset < boundary && (offset + mic->len) > boundary) >>> xdr_shift_buf(buf, boundary - offset); >>> >>> + trace_printk("boundary %d, offset %d, page_len %d\n", boundary, offset, buf->page_len); >>> + >> >> Btw, I did some troubleshooting with a printk in here a couple days ago: >> >> xdr_buf_read_mic: offset=136 boundary=142 buf->page_len=2 > > Ok, I see.. Your fix makes sense to me now, not much that xdr_buf_read_mic > can do about it, and we get rid of another BUG_ON site. > > Reviewed-by: Benjamin Coddington <bcodding@xxxxxxxxxx> Thanks! > BTW that git regression test with disconnect injection is .. mean. I > haven't hit the BUG_ON yet, but lots of: > > [ 171.770148] BUG: unable to handle page fault for address: ffff8880af767986 > [ 171.771752] #PF: supervisor read access in kernel mode > [ 171.772552] #PF: error_code(0x0000) - not-present page > [ 171.780214] RIP: 0010:kmem_cache_alloc+0x66/0x2d0 !!! haven't seen that one. (I'm on NFS/RDMA. Should I try TCP, or just let you chase this one down?) -- Chuck Lever