Sorry for the late reply, more below. On Wed, Jun 19, 2024 at 4:46 PM Chuck Lever III <chuck.lever@xxxxxxxxxx> wrote: > > Hi Dave- > > I'm testing pNFS SCSI layouts with tracing enabled on the > client, and I hit this BUG twice today: > > BUG: kernel NULL pointer dereference, address: 0000000000000000 > #PF: supervisor read access in kernel mode > #PF: error_code(0x0000) - not-present page > PGD 0 P4D 0 Oops: Oops: 0000 [#1] PREEMPT SMP PTI > CPU: 0 PID: 52230 Comm: fio Not tainted 6.10.0-rc4-g9f1e0e495093 #5 > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40 04/01/2014 > RIP: 0010:nfs_folio_length+0x29/0x170 [nfs] > Code: 90 41 56 41 55 41 54 55 53 48 89 fb 48 83 ec 08 48 8b 07 a9 00 00 04 00 74 0c 48 8b 07 f6 c4 08 0f 85 89 00 00 00 48 8b 43 18 <48> 8b 00 48 8b 68 50 48 85 ed 7e 66 48 8b 03 a8 40 74 7c 4c 8b 6b > RSP: 0018:ffffb0800c2abb28 EFLAGS: 00010246 > RAX: 0000000000000000 RBX: ffffe5d8c44f4a00 RCX: ffffffffa8525000 > RDX: 0017ffffd0000028 RSI: 000000000000004e RDI: ffffe5d8c44f4a00 > RBP: ffff9871911501b8 R08: ffff987191150020 R09: 0000000000000003 > R10: 0000000000000006 R11: 0000000000000000 R12: ffffe5d8c44f4a00 > R13: ffffb0800c2abb60 R14: 0000000000000000 R15: ffffe5d8c44f4a00 > FS: 00007f1d25d6f080(0000) GS:ffff987277c00000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000000000000000 CR3: 000000013cce0005 CR4: 0000000000170ef0 > Call Trace: > <TASK> > ? __die_body.cold+0x19/0x27 > ? page_fault_oops+0xca/0x2a0 > ? search_module_extables+0x19/0x60 > ? search_bpf_extables+0x5f/0x80 > ? exc_page_fault+0x7e/0x180 > ? asm_exc_page_fault+0x26/0x30 > ? nfs_folio_length+0x29/0x170 [nfs] > trace_event_raw_event_nfs_folio_event_done+0xdc/0x170 [nfs] > nfs_read_folio+0x16d/0x260 [nfs] > ? __pfx_nfs_read_folio+0x10/0x10 [nfs] > filemap_read_folio+0x43/0xe0 > filemap_fault+0x70d/0xd00 > __do_fault+0x35/0x120 > do_fault+0xbb/0x470 > __handle_mm_fault+0x7e9/0x1060 > ? mt_find+0x21c/0x570 > handle_mm_fault+0xf0/0x300 > do_user_addr_fault+0x217/0x620 > exc_page_fault+0x7e/0x180 > asm_exc_page_fault+0x26/0x30 > RIP: 0033:0x7f1d25ecedb7 > Code: 7e 6f 44 16 e0 48 29 fe 48 83 e1 e0 48 01 ce 0f 1f 40 00 c5 fe 6f 4e 60 c5 fe 6f 56 40 c5 fe 6f 5e 20 c5 fe 6f 26 48 83 c6 80 <c5> fd 7f 49 60 c5 fd 7f 51 40 c5 fd 7f 59 20 c5 fd 7f 21 48 83 c1 > RSP: 002b:00007ffd92df7f58 EFLAGS: 00010203 > RAX: 00007f1d1d0e6000 RBX: 000056211867f6c0 RCX: 00007f1d1d0e6f60 > RDX: 0000000000001000 RSI: 0000562118699ee0 RDI: 00007f1d1d0e6000 > RBP: 00007ffd92df7f70 R08: 0000000000000000 R09: 0000000000000000 > R10: 0000562118688f30 R11: 0000000000000001 R12: 00007f1d1d16aac0 > R13: 00007ffd92df8030 R14: 0000000000000001 R15: 0000000000001000 > </TASK> > Modules linked in: blocklayoutdriver rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_c> > CR2: 0000000000000000 > ---[ end trace 0000000000000000 ]--- > RIP: 0010:nfs_folio_length+0x29/0x170 [nfs] > Code: 90 41 56 41 55 41 54 55 53 48 89 fb 48 83 ec 08 48 8b 07 a9 00 00 04 00 74 0c 48 8b 07 f6 c4 08 0f 85 89 00 00 00 48 8b 43 18 <48> 8b 00 48 8b 68 50 48 85 ed 7e 66 48 8b 03 a8 40 74 7c 4c 8b 6b > RSP: 0018:ffffb0800c2abb28 EFLAGS: 00010246 > RAX: 0000000000000000 RBX: ffffe5d8c44f4a00 RCX: ffffffffa8525000 > RDX: 0017ffffd0000028 RSI: 000000000000004e RDI: ffffe5d8c44f4a00 > RBP: ffff9871911501b8 R08: ffff987191150020 R09: 0000000000000003 > R10: 0000000000000006 R11: 0000000000000000 R12: ffffe5d8c44f4a00 > R13: ffffb0800c2abb60 R14: 0000000000000000 R15: ffffe5d8c44f4a00 > FS: 00007f1d25d6f080(0000) GS:ffff987277c00000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000000000000000 CR3: 000000013cce0005 CR4: 0000000000170ef0 > > cel@renoir:linux$ scripts/faddr2line fs/nfs/nfs.ko nfs_folio_length+0x29 > nfs_folio_length+0x29/0x170: > nfs_folio_length at /home/cel/src/kdevops/cel/linux/fs/nfs/internal.h:824 (discriminator 1) > > Which in my version of the source code is: > > 822 static inline size_t nfs_folio_length(struct folio *folio) > 823 { > 824 loff_t i_size = i_size_read(folio_file_mapping(folio)->host); > Ok so looks like we need to go deeper to know what the NULL pointer represented in the BUG statement. Decoding the oops: $ ~/git/kernel/scripts/decodecode < ./oops.txt Code: 90 41 56 41 55 41 54 55 53 48 89 fb 48 83 ec 08 48 8b 07 a9 00 00 04 00 74 0c 48 8b 07 f6 c4 08 0f 85 89 00 00 00 48 8b 43 18 <48> 8b 00 48 8b 68 50 48 85 ed 7e 66 48 8b 03 a8 40 74 7c 4c 8b 6b All code ======== 0: 90 nop 1: 41 56 push %r14 3: 41 55 push %r13 5: 41 54 push %r12 7: 55 push %rbp 8: 53 push %rbx 9: 48 89 fb mov %rdi,%rbx c: 48 83 ec 08 sub $0x8,%rsp 10: 48 8b 07 mov (%rdi),%rax 13: a9 00 00 04 00 test $0x40000,%eax 18: 74 0c je 0x26 1a: 48 8b 07 mov (%rdi),%rax 1d: f6 c4 08 test $0x8,%ah 20: 0f 85 89 00 00 00 jne 0xaf 26: 48 8b 43 18 mov 0x18(%rbx),%rax 2a:* 48 8b 00 mov (%rax),%rax <-- trapping instruction 2d: 48 8b 68 50 mov 0x50(%rax),%rbp 31: 48 85 ed test %rbp,%rbp 34: 7e 66 jle 0x9c 36: 48 8b 03 mov (%rbx),%rax 39: a8 40 test $0x40,%al 3b: 74 7c je 0xb9 3d: 4c rex.WR 3e: 8b .byte 0x8b 3f: 6b .byte 0x6b Code starting with the faulting instruction =========================================== 0: 48 8b 00 mov (%rax),%rax 3: 48 8b 68 50 mov 0x50(%rax),%rbp 7: 48 85 ed test %rbp,%rbp a: 7e 66 jle 0x72 c: 48 8b 03 mov (%rbx),%rax f: a8 40 test $0x40,%al 11: 74 7c je 0x8f 13: 4c rex.WR 14: 8b .byte 0x8b 15: 6b .byte 0x6b Snip out the essential instructions, which shows we just have read an address at an offset of 0x18, then that must have been NULL and we tried to dereference NULL which led to the BUG: 9: 48 89 fb mov %rdi,%rbx ... 26: 48 8b 43 18 mov 0x18(%rbx),%rax 2a:* 48 8b 00 mov (%rax),%rax <-- trapping instruction So the question is what is at offset 0x18 in the above code in nfs_folio_length()? Now folio_file_mapping() returns an address_space pointer ,and address_space->host is at offset 0. So if 'mapping' is at offset 0x18 of folio, then mapping is NULL. And from the code, this seems to fit. 310 struct folio { 311 /* private: don't document the anon union */ 312 union { 313 struct { 314 /* public: */ 315 unsigned long flags; 316 union { 317 struct list_head lru; 318 /* private: avoid cluttering the output */ 319 struct { 320 void *__filler; 321 /* public: */ 322 unsigned int mlock_count; 323 /* private: */ 324 }; 325 /* public: */ 326 }; 327 struct address_space *mapping; 328 pgoff_t index; 445 static inline struct address_space *folio_file_mapping(struct folio *folio) 446 { 447 if (unlikely(folio_test_swapcache(folio))) 448 return swapcache_mapping(folio); 449 450 return folio->mapping; 451 } So it seems folio->mapping == NULL and that is why we BUG. > I'm pinging you first about this because you changed the > code around the trace_nfs_aop_read_done() call site in > commit 000dbe0bec05 ("NFS: Convert buffered read paths to > use netfs when fscache is enabled"). > If I have the above analysis right, then maybe this patch would fix it? Also I wonder if nfs_page_length() might have the same issue? diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h index 9f0f4534744b..832f07994b88 100644 --- a/fs/nfs/internal.h +++ b/fs/nfs/internal.h @@ -821,7 +821,12 @@ unsigned int nfs_page_length(struct page *page) */ static inline size_t nfs_folio_length(struct folio *folio) { - loff_t i_size = i_size_read(folio_file_mapping(folio)->host); + loff_t i_size; + + if (!folio->mapping) + return 0; + + i_size = i_size_read(folio_file_mapping(folio)->host); if (i_size > 0) { pgoff_t index = folio_index(folio) >> folio_order(folio);