On Tue, May 12, 2020 at 11:52:34AM +0000, Wan, Kaike wrote: > > > > -----Original Message----- > > From: linux-rdma-owner@xxxxxxxxxxxxxxx <linux-rdma- > > owner@xxxxxxxxxxxxxxx> On Behalf Of Leon Romanovsky > > Sent: Tuesday, May 12, 2020 1:55 AM > > To: Dalessandro, Dennis <dennis.dalessandro@xxxxxxxxx> > > Cc: jgg@xxxxxxxx; dledford@xxxxxxxxxx; linux-rdma@xxxxxxxxxxxxxxx; > > Marciniszyn, Mike <mike.marciniszyn@xxxxxxxxx>; stable@xxxxxxxxxxxxxxx; > > Wan, Kaike <kaike.wan@xxxxxxxxx> > > Subject: Re: [PATCH for-rc or next 1/3] IB/hfi1: Do not destroy hfi1_wq when > > the device is shut down > > > > On Mon, May 11, 2020 at 11:13:15PM -0400, Dennis Dalessandro wrote: > > > From: Kaike Wan <kaike.wan@xxxxxxxxx> > > > > > > The workqueue hfi1_wq is destroyed in function shutdown_device(), > > > which is called by either shutdown_one() or remove_one(). The function > > > shutdown_one() is called when the kernel is rebooted while > > > remove_one() is called when the hfi1 driver is unloaded. When the > > > kernel is rebooted, hfi1_wq is destroyed while all qps are still > > > active, leading to a kernel crash: > > > > I was under impression that kernel reboot should follow same logic as > > module removal. This is what graceful reboot will do anyway. Can you please > > give me a link where I can read about difference in those flows? > > > I used to think the same. However, by adding traces to the hfi driver, I found out that the shutdown function of the pci_driver was called when typing "reboot" while the remove function of the pci_driver was called when typing "modprobe -r hfi1". I took a look on what mlx5_core is doing in shutdown flow and it can be summarized in the following: 1. Drain workqueues 2. Close PCI 3. Don't release anything. So maybe you didn't flush the hfi1_wq? > > I am not an expert on kernel reboot and can someone give some hints? > > Kaike > > > >