Re: Linux 6.6.12: kernel BUG at net/sunrpc/svc.c:581!: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI, svc_destroy

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> On Jun 20, 2024, at 5:34 PM, Paul Menzel <pmenzel@xxxxxxxxxxxxx> wrote:
> 
> Dear Linux folks,
> 
> 
> On Linux 6.6.12 copying several large files (5–80 GB) in parallel, and trying to change the number of server threads with `rpc.nfsd nproc` afterward, `systemctl restart nfsd` resulted in:
> 
> ```
> [2502367.958818] nfsd: last server has exited, flushing export cache
> [2502369.261987] NFSD: Using UMH upcall client tracking operations.
> [2502369.268678] NFSD: starting 90-second grace period (net f0000000)
> 
> [2502369.285013] ------------[ cut here ]------------
> [2502369.291230] kernel BUG at net/sunrpc/svc.c:581!
> [2502369.297008] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
> [2502369.303548] CPU: 9 PID: 4579 Comm: rpc.nfsd Not tainted 6.6.12.mx64.461 #1
> [2502369.311741] Hardware name: Dell Inc. PowerEdge T440/021KCD, BIOS 2.12.2 07/09/2021
> [2502369.320696] RIP: 0010:svc_destroy+0xc9/0xf0 [sunrpc]
> [2502369.327474] Code: 00 00 00 be 01 00 00 00 e8 d4 f2 54 e1 41 3b 6d 74 72 bc 49 8b 7d 7c e8 95 40 1c e1 4c 89 e7 5b 5d 41 5c 41 5d e9 87 40 1c e1 <0f> 0b 48 8b 47 ec 48 c7 c7 f9 5a 15 a0 48 8b 70 20 e8 c1 87 01 e1
> [2502369.349863] RSP: 0018:ffffc9000e26bd60 EFLAGS: 00010206
> [2502369.356573] RAX: ffff88886064e130 RBX: ffff88886064e114 RCX: 0000000000000010
> [2502369.365173] RDX: ffff889092d73018 RSI: 0000000000000246 RDI: ffff88a03fc1cfc0
> [2502369.373879] RBP: 0000000000000040 R08: 000000000000000f R09: 0000000000000001
> [2502369.382474] R10: ffff889092d71000 R11: 0000000000000000 R12: ffff88886064e100
> [2502369.391115] R13: ffff88886064e114 R14: ffff88886064e100 R15: ffff8881061d6000
> [2502369.399730] FS:  00007f610ac30740(0000) GS:ffff88a03fd00000(0000) knlGS:0000000000000000
> [2502369.409410] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [2502369.416667] CR2: 000000000069adf8 CR3: 00000004ba14a002 CR4: 00000000007706e0
> [2502369.425524] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [2502369.434240] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [2502369.442880] PKRU: 55555554
> [2502369.447193] Call Trace:
> [2502369.451211]  <TASK>
> [2502369.454982]  ? die+0x36/0x90
> [2502369.459421]  ? do_trap+0xda/0x100
> [2502369.464337]  ? svc_destroy+0xc9/0xf0 [sunrpc]
> [2502369.470479]  ? do_error_trap+0x65/0x80
> [2502369.475857]  ? svc_destroy+0xc9/0xf0 [sunrpc]
> [2502369.481924]  ? exc_invalid_op+0x50/0x70
> [2502369.487390]  ? svc_destroy+0xc9/0xf0 [sunrpc]
> [2502369.493402]  ? asm_exc_invalid_op+0x1a/0x20
> [2502369.498494]  ? svc_destroy+0xc9/0xf0 [sunrpc]
> [2502369.504826]  nfsd_svc+0x28c/0x3d0 [nfsd]
> [2502369.510836]  write_threads+0xe4/0x190 [nfsd]
> [2502369.517184]  ? __pfx_write_threads+0x10/0x10 [nfsd]
> [2502369.524580]  nfsctl_transaction_write+0x4a/0x80 [nfsd]
> [2502369.531495]  vfs_write+0xcf/0x450
> [2502369.535578]  ksys_write+0x6f/0xf0
> [2502369.540415]  do_syscall_64+0x43/0x90
> [2502369.545455]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
> [2502369.551988] RIP: 0033:0x7f610ad3aa20
> [2502369.557030] Code: 40 00 48 8b 15 e1 b3 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 80 3d c1 3b 0e 00 00 74 17 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 48 83 ec 28 48 89
> [2502369.578504] RSP: 002b:00007fff4d8deaf8 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
> [2502369.587720] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f610ad3aa20
> [2502369.596419] RDX: 0000000000000003 RSI: 000000000040d540 RDI: 0000000000000003
> [2502369.604613] RBP: 0000000000000003 R08: 0000000000000000 R09: 00007fff4d8de990
> [2502369.613258] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000040
> [2502369.621276] R13: 0000000000000001 R14: 000000000040e2a0 R15: 000000000040910e
> [2502369.629927]  </TASK>
> [2502369.632849] Modules linked in: rpcsec_gss_krb5 nfsv4 nfs i915 iosf_mbi drm_buddy drm_display_helper ttm intel_gtt video 8021q garp stp mrp llc x86_pkg_temp_thermal coretemp kvm_intel tg3 kvm irqbypass crc32c_intel wmi_bmof mgag200 i2c_algo_bit libphy iTCO_wdt i40e iTCO_vendor_support wmi ipmi_si nfsd auth_rpcgss oid_registry nfs_acl lockd grace sunrpc ip_tables x_tables ipv6 autofs4
> [2502369.672534] ---[ end trace 0000000000000000 ]---
> [2502369.677557] RIP: 0010:svc_destroy+0xc9/0xf0 [sunrpc]
> [2502369.682931] Code: 00 00 00 be 01 00 00 00 e8 d4 f2 54 e1 41 3b 6d 74 72 bc 49 8b 7d 7c e8 95 40 1c e1 4c 89 e7 5b 5d 41 5c 41 5d e9 87 40 1c e1 <0f> 0b 48 8b 47 ec 48 c7 c7 f9 5a 15 a0 48 8b 70 20 e8 c1 87 01 e1
> [2502369.702288] RSP: 0018:ffffc9000e26bd60 EFLAGS: 00010206
> [2502369.707906] RAX: ffff88886064e130 RBX: ffff88886064e114 RCX: 0000000000000010
> [2502369.715430] RDX: ffff889092d73018 RSI: 0000000000000246 RDI: ffff88a03fc1cfc0
> [2502369.722960] RBP: 0000000000000040 R08: 000000000000000f R09: 0000000000000001
> [2502369.730483] R10: ffff889092d71000 R11: 0000000000000000 R12: ffff88886064e100
> [2502369.738015] R13: ffff88886064e114 R14: ffff88886064e100 R15: ffff8881061d6000
> [2502369.745537] FS:  00007f610ac30740(0000) GS:ffff88a03fd00000(0000) knlGS:0000000000000000
> [2502369.754015] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [2502369.760149] CR2: 000000000069adf8 CR3: 00000004ba14a002 CR4: 00000000007706e0
> [2502369.767681] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [2502369.775210] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [2502369.782735] PKRU: 55555554
> ```
> 
> We have not experienced this with either 5.15.112 nor 5.15.160, though the later one has not been tested that much yet.
> 
> I found similar reports in the list archive [1], but the have a hard time following through as the commit hashes differ between the different Linux series and no commit message explicitly contains the trace. I assume it’s fixed in 6.6.34, but just wanted to report it anyway, so it’s documented, and maybe the maintainers can confirm.

There's nothing we can do about older releases of LTS kernels.
Please confirm this issue is fixed by testing 6.6.34 and
6.10-rc4.

Two possibly related upstream commits are:

64e6304169f1 ("nfsd: drop the nfsd_put helper")
2a501f55cd64 ("nfsd: call nfsd_last_thread() before final nfsd_put()")


--
Chuck Lever






[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux