RE: [PATCH 0/5] Fix potential issues for siw

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> -----Original Message-----
> From: Guoqing Jiang <guoqing.jiang@xxxxxxxxx>
> Sent: Friday, 28 July 2023 04:29
> To: Jason Gunthorpe <jgg@xxxxxxxx>; Bernard Metzler <BMT@xxxxxxxxxxxxxx>
> Cc: leon@xxxxxxxxxx; linux-rdma@xxxxxxxxxxxxxxx
> Subject: [EXTERNAL] Re: [PATCH 0/5] Fix potential issues for siw
> 
> 
> 
> On 7/28/23 09:16, Guoqing Jiang wrote:
> >
> >
> > On 7/28/23 01:29, Jason Gunthorpe wrote:
> >> On Thu, Jul 27, 2023 at 05:17:40PM +0000, Bernard Metzler wrote:
> >>>
> >>>> -----Original Message-----
> >>>> From: Guoqing Jiang <guoqing.jiang@xxxxxxxxx>
> >>>> Sent: Thursday, 27 July 2023 16:04
> >>>> To: Bernard Metzler <BMT@xxxxxxxxxxxxxx>; jgg@xxxxxxxx;
> >>>> leon@xxxxxxxxxx
> >>>> Cc: linux-rdma@xxxxxxxxxxxxxxx
> >>>> Subject: [EXTERNAL] [PATCH 0/5] Fix potential issues for siw
> >>>>
> >>>> Hi,
> >>>>
> >>>> Several issues appeared if we rmmod siw module after failed to insert
> >>>> the module (with manual change like below).
> >>>>
> >>>> --- a/drivers/infiniband/sw/siw/siw_main.c
> >>>> +++ b/drivers/infiniband/sw/siw/siw_main.c
> >>>> @@ -577,6 +577,7 @@ static __init int siw_init_module(void)
> >>>>          if (rv)
> >>>>                  goto out_error;
> >>>>
> >>>> +       goto out_error;
> >>>>          rdma_link_register(&siw_link_ops);
> >>>>
> >>>> Basically, these issues are double free, use before initalization or
> >>>> null pointer dereference. For more details, pls review the individual
> >>>> patch.
> >>>>
> >>>> Thanks,
> >>>> Guoqing
> >>> Hi Guoqing,
> >>>
> >>> very good catch, thank you. I was under the wrong assumption a
> >>> module is not loaded if the init_module() returns a value.
> >> I think that is actually true, isn't it? I'm confused?
> >
> > Yes, you are right. Since rv is still 0, so the module appears in the
> > kernel. Not sure if some tool could inject err like this. Feel free to
> > ignore this.
> 
> The below trace happened if I run a stress test with load siw module and
> unload siw in a loop, which should be fixed by patch 5,  so I think we
> need to apply it, what do you think?

I see. makes sense -- you are removing the module while some tx
threads did not get a chance to init their state.
Why don't we better cleanly initialize the tasks state before
spawning it? Your extra task parameter 'running' smells like
it could lead to a similar issue - at module removal, the
task may have not get a chance to set 'running' true, so it is not
stopped and finally left alone.

Working on an according patch...

Thanks,
Bernard.


> 
> [  414.537961] BUG: spinlock bad magic on CPU#0, modprobe/3722
> [  414.537965]  lock: 0xffff9d847bc380e8, .magic: 00000000, .owner:
> <none>/-1, .owner_cpu: 0
> [  414.537969] CPU: 0 PID: 3722 Comm: modprobe Tainted: G OE
> 6.5.0-rc3+ #16
> [  414.537971] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
> rel-1.16.0-0-gd239552c-rebuilt.opensuse.org 04/01/2014
> [  414.537973] Call Trace:
> [  414.537973]  <TASK>
> [  414.537975]  dump_stack_lvl+0x77/0xd0
> [  414.537979]  dump_stack+0x10/0x20
> [  414.537981]  spin_bug+0xa5/0xd0
> [  414.537984]  do_raw_spin_lock+0x90/0xd0
> [  414.537985]  _raw_spin_lock_irqsave+0x56/0x80
> [  414.537988]  ? __wake_up_common_lock+0x63/0xd0
> [  414.537990]  __wake_up_common_lock+0x63/0xd0
> [  414.537992]  __wake_up+0x13/0x30
> [  414.537994]  siw_stop_tx_thread+0x49/0x70 [siw]
> [  414.538000]  siw_exit_module+0x30/0x620 [siw]
> [  414.538006]  __do_sys_delete_module.constprop.0+0x18f/0x300
> [  414.538008]  ? syscall_enter_from_user_mode+0x21/0x70
> [  414.538010]  ? __this_cpu_preempt_check+0x13/0x20
> [  414.538012]  ? lockdep_hardirqs_on+0x86/0x120
> [  414.538014]  __x64_sys_delete_module+0x12/0x20
> [  414.538016]  do_syscall_64+0x5c/0x90
> [  414.538019]  ? do_syscall_64+0x69/0x90
> [  414.538020]  ? __this_cpu_preempt_check+0x13/0x20
> [  414.538022]  ? lockdep_hardirqs_on+0x86/0x120
> [  414.538024]  ? syscall_exit_to_user_mode+0x37/0x50
> [  414.538025]  ? do_syscall_64+0x69/0x90
> [  414.538026]  ? syscall_exit_to_user_mode+0x37/0x50
> [  414.538027]  ? do_syscall_64+0x69/0x90
> [  414.538029]  ? syscall_exit_to_user_mode+0x37/0x50
> [  414.538030]  ? do_syscall_64+0x69/0x90
> [  414.538032]  ? irqentry_exit_to_user_mode+0x25/0x30
> [  414.538033]  ? irqentry_exit+0x77/0xb0
> [  414.538034]  ? exc_page_fault+0xae/0x240
> [  414.538036]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
> [  414.538038] RIP: 0033:0x7f177eb26c9b
> 
> Thanks,
> Guoqing




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux