On Sun, 2020-04-19 at 14:22 +0800, Sun Ted wrote: > Hi Forks, > > On the kernel version 5.2.37 or a bit earlier, running pressure > testing with module insert & remove, actually the inserted module is > not special and could be anyone. > After dozens of testing, there will be throw below call trace and the > system hung. > > Testing script for insert & remove module like below: > for each in {1..100} ; do echo "$each" ; insmod > openvswitch.ko ; rmmod openvswitch.ko ; usleep 100000 ; done > The target machine is: arm64, cortex a53. > > Disassembled the line " __wake_up_common_lock+0x98/0xe0 ", it located > the code "include/linux/spinlock.h" , it should be using the NULL > pointer "lock->rlock " > > static __always_inline void spin_unlock_irqrestore(spinlock_t *lock, > unsigned long flags) > { > raw_spin_unlock_irqrestore(&lock->rlock, flags); > > Did you ever see this issue or have a fix for this based on kernel > 5.2.37 version? > > Call trace: > openvswitch: Open vSwitch switching datapath > Unable to handle kernel NULL pointer dereference at virtual address > 0000000000000000 > Mem abort info: > ESR = 0x86000005 > Exception class = IABT (current EL), IL = 32 bits > SET = 0, FnV = 0 > EA = 0, S1PTW = 0 > user pgtable: 4k pages, 39-bit VAs, pgdp=00000008f2ea9000 > [0000000000000000] pgd=0000000000000000, pud=0000000000000000 > Internal error: Oops: 86000005 [#1] PREEMPT SMP > Modules linked in: openvswitch hse sch_fq_codel nsh nf_conncount > nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 [last unloaded: > openvswitch] > CPU: 2 PID: 209 Comm: kworker/u9:3 Not tainted 5.2.37-yocto-standard > #1 > Hardware name: Freescale S32G275 (DT) > Workqueue: xprtiod xs_stream_data_receive_workfn > pstate: 80000085 (Nzcv daIf -PAN -UAO) > pc : 0x0 > lr : __wake_up_common+0x90/0x150 > sp : ffffff801146ba60 > x29: ffffff801146ba60 x28: ffffff8010d36000 > x27: ffffff801146bb90 x26: 0000000000000000 > x25: 0000000000000003 x24: 0000000000000000 > x23: 0000000000000001 x22: ffffff801146bb00 > x21: ffffff8010d351e8 x20: 0000000000000000 > x19: ffffff80112c37a0 x18: 0000000000000000 > x17: 0000000000000000 x16: 0000000000000000 > x15: 0000000000000000 x14: 0000000000000000 > x13: 0000003002000000 x12: 0000902802000000 > x11: 0000000000000000 x10: 000001000000ed01 > x9 : 0000010000000000 x8 : 9b5e0000000048a6 > x7 : 0000000000000068 x6 : ffffff80112c37a0 > x5 : 0000000000000000 x4 : ffffff801146bb90 > x3 : ffffff801146bb90 x2 : 0000000000000000 > x1 : 0000000000000003 x0 : ffffff80112c37a0 > Call trace: > 0x0 > __wake_up_common_lock+0x98/0xe0 > __wake_up+0x40/0x50 > wake_up_bit+0x8c/0xb8 > rpc_make_runnable+0xc8/0xd0 > rpc_wake_up_task_on_wq_queue_action_locked+0x110/0x278 > rpc_wake_up_queued_task.part.0+0x40/0x58 > rpc_wake_up_queued_task+0x38/0x48 > xprt_complete_rqst+0x68/0x128 > xs_read_stream.constprop.0+0x2ec/0x3d0 > xs_stream_data_receive_workfn+0x60/0x190 > process_one_work+0x1bc/0x440 > worker_thread+0x50/0x408 > kthread+0x104/0x130 > ret_from_fork+0x10/0x1c > Code: bad PC value > Kernel panic - not syncing: Fatal exception in interrupt > SMP: stopping secondary CPUs > Kernel Offset: disabled > CPU features: 0x0002,2000200c > Memory Limit: none > I don't see why NFS would care about the openswitch module being loaded or not, and so given your stack dump, I suspect this is more about a corruption of the global bit_wait_table. Cc: netdev@xxxxxxxxxxxxxxx to see if anyone there is aware of any recent module cleanup issues with openswitch. -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@xxxxxxxxxxxxxxx