Re: BUG in nfs_read_folio() when tracing is enabled

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Sorry for the late reply, more below.

On Wed, Jun 19, 2024 at 4:46 PM Chuck Lever III <chuck.lever@xxxxxxxxxx> wrote:
>
> Hi Dave-
>
> I'm testing pNFS SCSI layouts with tracing enabled on the
> client, and I hit this BUG twice today:
>
> BUG: kernel NULL pointer dereference, address: 0000000000000000
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x0000) - not-present page
> PGD 0 P4D 0  Oops: Oops: 0000 [#1] PREEMPT SMP PTI
> CPU: 0 PID: 52230 Comm: fio Not tainted 6.10.0-rc4-g9f1e0e495093 #5
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40 04/01/2014
> RIP: 0010:nfs_folio_length+0x29/0x170 [nfs]
> Code: 90 41 56 41 55 41 54 55 53 48 89 fb 48 83 ec 08 48 8b 07 a9 00 00 04 00 74 0c 48 8b 07 f6 c4 08 0f 85 89 00 00 00 48 8b 43 18 <48> 8b 00 48 8b 68 50 48 85 ed 7e 66 48 8b 03 a8 40 74 7c 4c 8b 6b
> RSP: 0018:ffffb0800c2abb28 EFLAGS: 00010246
> RAX: 0000000000000000 RBX: ffffe5d8c44f4a00 RCX: ffffffffa8525000
> RDX: 0017ffffd0000028 RSI: 000000000000004e RDI: ffffe5d8c44f4a00
> RBP: ffff9871911501b8 R08: ffff987191150020 R09: 0000000000000003
> R10: 0000000000000006 R11: 0000000000000000 R12: ffffe5d8c44f4a00
> R13: ffffb0800c2abb60 R14: 0000000000000000 R15: ffffe5d8c44f4a00
> FS:  00007f1d25d6f080(0000) GS:ffff987277c00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000000 CR3: 000000013cce0005 CR4: 0000000000170ef0
> Call Trace:
>  <TASK>
>  ? __die_body.cold+0x19/0x27
>  ? page_fault_oops+0xca/0x2a0
>  ? search_module_extables+0x19/0x60
>  ? search_bpf_extables+0x5f/0x80
>  ? exc_page_fault+0x7e/0x180
>  ? asm_exc_page_fault+0x26/0x30
>  ? nfs_folio_length+0x29/0x170 [nfs]
>  trace_event_raw_event_nfs_folio_event_done+0xdc/0x170 [nfs]
>  nfs_read_folio+0x16d/0x260 [nfs]
>  ? __pfx_nfs_read_folio+0x10/0x10 [nfs]
>  filemap_read_folio+0x43/0xe0
>  filemap_fault+0x70d/0xd00
>  __do_fault+0x35/0x120
>  do_fault+0xbb/0x470
>  __handle_mm_fault+0x7e9/0x1060
>  ? mt_find+0x21c/0x570
>  handle_mm_fault+0xf0/0x300
>  do_user_addr_fault+0x217/0x620
>  exc_page_fault+0x7e/0x180
>  asm_exc_page_fault+0x26/0x30
> RIP: 0033:0x7f1d25ecedb7
> Code: 7e 6f 44 16 e0 48 29 fe 48 83 e1 e0 48 01 ce 0f 1f 40 00 c5 fe 6f 4e 60 c5 fe 6f 56 40 c5 fe 6f 5e 20 c5 fe 6f 26 48 83 c6 80 <c5> fd 7f 49 60 c5 fd 7f 51 40 c5 fd 7f 59 20 c5 fd 7f 21 48 83 c1
> RSP: 002b:00007ffd92df7f58 EFLAGS: 00010203
> RAX: 00007f1d1d0e6000 RBX: 000056211867f6c0 RCX: 00007f1d1d0e6f60
> RDX: 0000000000001000 RSI: 0000562118699ee0 RDI: 00007f1d1d0e6000
> RBP: 00007ffd92df7f70 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000562118688f30 R11: 0000000000000001 R12: 00007f1d1d16aac0
> R13: 00007ffd92df8030 R14: 0000000000000001 R15: 0000000000001000
>  </TASK>
> Modules linked in: blocklayoutdriver rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_c>
> CR2: 0000000000000000
> ---[ end trace 0000000000000000 ]---
> RIP: 0010:nfs_folio_length+0x29/0x170 [nfs]
> Code: 90 41 56 41 55 41 54 55 53 48 89 fb 48 83 ec 08 48 8b 07 a9 00 00 04 00 74 0c 48 8b 07 f6 c4 08 0f 85 89 00 00 00 48 8b 43 18 <48> 8b 00 48 8b 68 50 48 85 ed 7e 66 48 8b 03 a8 40 74 7c 4c 8b 6b
> RSP: 0018:ffffb0800c2abb28 EFLAGS: 00010246
> RAX: 0000000000000000 RBX: ffffe5d8c44f4a00 RCX: ffffffffa8525000
> RDX: 0017ffffd0000028 RSI: 000000000000004e RDI: ffffe5d8c44f4a00
> RBP: ffff9871911501b8 R08: ffff987191150020 R09: 0000000000000003
> R10: 0000000000000006 R11: 0000000000000000 R12: ffffe5d8c44f4a00
> R13: ffffb0800c2abb60 R14: 0000000000000000 R15: ffffe5d8c44f4a00
> FS:  00007f1d25d6f080(0000) GS:ffff987277c00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000000 CR3: 000000013cce0005 CR4: 0000000000170ef0
>
> cel@renoir:linux$ scripts/faddr2line fs/nfs/nfs.ko nfs_folio_length+0x29
> nfs_folio_length+0x29/0x170:
> nfs_folio_length at /home/cel/src/kdevops/cel/linux/fs/nfs/internal.h:824 (discriminator 1)
>
> Which in my version of the source code is:
>
> 822 static inline size_t nfs_folio_length(struct folio *folio)
> 823 {
> 824         loff_t i_size = i_size_read(folio_file_mapping(folio)->host);
>

Ok so looks like we need to go deeper to know what the NULL pointer
represented in the BUG statement.
Decoding the oops:
$ ~/git/kernel/scripts/decodecode < ./oops.txt
Code: 90 41 56 41 55 41 54 55 53 48 89 fb 48 83 ec 08 48 8b 07 a9 00
00 04 00 74 0c 48 8b 07 f6 c4 08 0f 85 89 00 00 00 48 8b 43 18 <48> 8b
00 48 8b 68 50 48 85 ed 7e 66 48 8b 03 a8 40 74 7c 4c 8b 6b
All code
========
   0: 90                    nop
   1: 41 56                push   %r14
   3: 41 55                push   %r13
   5: 41 54                push   %r12
   7: 55                    push   %rbp
   8: 53                    push   %rbx
   9: 48 89 fb              mov    %rdi,%rbx
   c: 48 83 ec 08          sub    $0x8,%rsp
  10: 48 8b 07              mov    (%rdi),%rax
  13: a9 00 00 04 00        test   $0x40000,%eax
  18: 74 0c                je     0x26
  1a: 48 8b 07              mov    (%rdi),%rax
  1d: f6 c4 08              test   $0x8,%ah
  20: 0f 85 89 00 00 00    jne    0xaf
  26: 48 8b 43 18          mov    0x18(%rbx),%rax
  2a:* 48 8b 00              mov    (%rax),%rax <-- trapping instruction
  2d: 48 8b 68 50          mov    0x50(%rax),%rbp
  31: 48 85 ed              test   %rbp,%rbp
  34: 7e 66                jle    0x9c
  36: 48 8b 03              mov    (%rbx),%rax
  39: a8 40                test   $0x40,%al
  3b: 74 7c                je     0xb9
  3d: 4c                    rex.WR
  3e: 8b                    .byte 0x8b
  3f: 6b                    .byte 0x6b

Code starting with the faulting instruction
===========================================
   0: 48 8b 00              mov    (%rax),%rax
   3: 48 8b 68 50          mov    0x50(%rax),%rbp
   7: 48 85 ed              test   %rbp,%rbp
   a: 7e 66                jle    0x72
   c: 48 8b 03              mov    (%rbx),%rax
   f: a8 40                test   $0x40,%al
  11: 74 7c                je     0x8f
  13: 4c                    rex.WR
  14: 8b                    .byte 0x8b
  15: 6b                    .byte 0x6b


Snip out the essential instructions, which shows we just have read an
address at an offset of 0x18, then that must have been NULL and we
tried to dereference NULL which led to the BUG:
   9: 48 89 fb              mov    %rdi,%rbx
...
  26: 48 8b 43 18          mov    0x18(%rbx),%rax
  2a:* 48 8b 00              mov    (%rax),%rax <-- trapping instruction

So the question is what is at offset 0x18 in the above code in
nfs_folio_length()?
Now folio_file_mapping() returns an address_space pointer ,and
address_space->host is at offset 0.
So if 'mapping' is at offset 0x18 of folio, then mapping is NULL.  And
from the code, this seems to fit.

 310 struct folio {
 311         /* private: don't document the anon union */
 312         union {
 313                 struct {
 314         /* public: */
 315                         unsigned long flags;
 316                         union {
 317                                 struct list_head lru;
 318         /* private: avoid cluttering the output */
 319                                 struct {
 320                                         void *__filler;
 321         /* public: */
 322                                         unsigned int mlock_count;
 323         /* private: */
 324                                 };
 325         /* public: */
 326                         };
 327                         struct address_space *mapping;
 328                         pgoff_t index;


 445 static inline struct address_space *folio_file_mapping(struct folio *folio)
 446 {
 447         if (unlikely(folio_test_swapcache(folio)))
 448                 return swapcache_mapping(folio);
 449
 450         return folio->mapping;
 451 }

So it seems folio->mapping == NULL and that is why we BUG.


> I'm pinging you first about this because you changed the
> code around the trace_nfs_aop_read_done() call site in
> commit 000dbe0bec05 ("NFS: Convert buffered read paths to
> use netfs when fscache is enabled").
>

If I have the above analysis right, then maybe this patch would fix it?
Also I wonder if nfs_page_length() might have the same issue?

diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 9f0f4534744b..832f07994b88 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -821,7 +821,12 @@ unsigned int nfs_page_length(struct page *page)
  */
 static inline size_t nfs_folio_length(struct folio *folio)
 {
-       loff_t i_size = i_size_read(folio_file_mapping(folio)->host);
+       loff_t i_size;
+
+       if (!folio->mapping)
+               return 0;
+
+       i_size = i_size_read(folio_file_mapping(folio)->host);

        if (i_size > 0) {
                pgoff_t index = folio_index(folio) >> folio_order(folio);






[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux