On Sat, Jun 10, 2023 at 6:13 AM Krzysztof Kozlowski <krzysztof.kozlowski@xxxxxxxxxx> wrote: > > > On 09/06/2023 22:00, Anna Schumaker wrote: > > From: Anna Schumaker <Anna.Schumaker@xxxxxxxxxx> > > > > I found that the read code might send multiple requests using the same > > nfs_pgio_header, but nfs4_proc_read_setup() is only called once. This is > > how we ended up occasionally double-freeing the scratch buffer, but also > > means we set a NULL pointer but non-zero length to the xdr scratch > > buffer. This results in an oops the first time decoding needs to copy > > something to scratch, which frequently happens when decoding READ_PLUS > > hole segments. > > > > I fix this by moving scratch handling into the pageio read code. I > > provide a function to allocate scratch space for decoding read replies, > > and free the scratch buffer when the nfs_pgio_header is freed. > > > > Krzysztof Kozlowski hit a bug a while ago with similar symptoms, > > and I'm hopeful that this patch fixes his issue. > > Unfortunately it does not help. Same NULL ptr, next-20230609 with this > patchset: That's unfortunate. I was really hoping between patch #2 and #3 that it would finally address the issue. I think you said your client is ARM v7, that's 32-bit right? I'll try to do some 32-bit testing to see if that uncovers anything on my end. In the meantime, I'll try to update the debugging printk() patch based on what I learned while working patch #3 last week. I'll try to get that to you in the next day or two. Anna > > > [ 26.780433] Unable to handle kernel NULL pointer dereference at virtual address 00000004 when read > > [ 27.124547] mmiocpy from xdr_inline_decode (net/sunrpc/xdr.c:1424 net/sunrpc/xdr.c:1459) > [ 27.129643] xdr_inline_decode from nfs4_xdr_dec_read_plus (fs/nfs/nfs42xdr.c:1069 fs/nfs/nfs42xdr.c:1152 fs/nfs/nfs42xdr.c:1365 fs/nfs/nfs42xdr.c:1346) > [ 27.136147] nfs4_xdr_dec_read_plus from call_decode (net/sunrpc/clnt.c:2592) > [ 27.142124] call_decode from __rpc_execute (include/asm-generic/bitops/generic-non-atomic.h:128 net/sunrpc/sched.c:952) > [ 27.147232] __rpc_execute from rpc_async_schedule (include/linux/sched/mm.h:368 net/sunrpc/sched.c:1033) > [ 27.152864] rpc_async_schedule from process_one_work (include/linux/atomic/atomic-arch-fallback.h:444 include/linux/jump_label.h:260 include/linux/jump_label.h:270 include/trace/events/workqueue.h:108 kernel/workqueue.c:2599) > [ 27.158935] process_one_work from worker_thread (include/linux/list.h:292 kernel/workqueue.c:2746) > [ 27.164476] worker_thread from kthread (kernel/kthread.c:381) > [ 27.169329] kthread from ret_from_fork (arch/arm/kernel/entry-common.S:134) > > Best regards, > Krzysztof >