> -----Original Message----- > From: linux-rdma-owner@xxxxxxxxxxxxxxx <linux-rdma- > owner@xxxxxxxxxxxxxxx> On Behalf Of Leon Romanovsky > Sent: Wednesday, May 13, 2020 3:59 AM > To: Wan, Kaike <kaike.wan@xxxxxxxxx> > Cc: Dalessandro, Dennis <dennis.dalessandro@xxxxxxxxx>; jgg@xxxxxxxx; > dledford@xxxxxxxxxx; linux-rdma@xxxxxxxxxxxxxxx; Marciniszyn, Mike > <mike.marciniszyn@xxxxxxxxx>; stable@xxxxxxxxxxxxxxx > Subject: Re: [PATCH for-rc or next 1/3] IB/hfi1: Do not destroy hfi1_wq when > the device is shut down > > On Tue, May 12, 2020 at 11:52:34AM +0000, Wan, Kaike wrote: > > > > > > > -----Original Message----- > > > From: linux-rdma-owner@xxxxxxxxxxxxxxx <linux-rdma- > > > owner@xxxxxxxxxxxxxxx> On Behalf Of Leon Romanovsky > > > Sent: Tuesday, May 12, 2020 1:55 AM > > > To: Dalessandro, Dennis <dennis.dalessandro@xxxxxxxxx> > > > Cc: jgg@xxxxxxxx; dledford@xxxxxxxxxx; linux-rdma@xxxxxxxxxxxxxxx; > > > Marciniszyn, Mike <mike.marciniszyn@xxxxxxxxx>; > > > stable@xxxxxxxxxxxxxxx; Wan, Kaike <kaike.wan@xxxxxxxxx> > > > Subject: Re: [PATCH for-rc or next 1/3] IB/hfi1: Do not destroy > > > hfi1_wq when the device is shut down > > > > > > On Mon, May 11, 2020 at 11:13:15PM -0400, Dennis Dalessandro wrote: > > > > From: Kaike Wan <kaike.wan@xxxxxxxxx> > > > > > > > > The workqueue hfi1_wq is destroyed in function shutdown_device(), > > > > which is called by either shutdown_one() or remove_one(). The > > > > function > > > > shutdown_one() is called when the kernel is rebooted while > > > > remove_one() is called when the hfi1 driver is unloaded. When the > > > > kernel is rebooted, hfi1_wq is destroyed while all qps are still > > > > active, leading to a kernel crash: > > > > > > I was under impression that kernel reboot should follow same logic > > > as module removal. This is what graceful reboot will do anyway. Can > > > you please give me a link where I can read about difference in those > flows? > > > > > I used to think the same. However, by adding traces to the hfi driver, I > found out that the shutdown function of the pci_driver was called when > typing "reboot" while the remove function of the pci_driver was called > when typing "modprobe -r hfi1". > > I took a look on what mlx5_core is doing in shutdown flow and it can be > summarized in the following: > 1. Drain workqueues > 2. Close PCI > 3. Don't release anything. > > So maybe you didn't flush the hfi1_wq? Will add the flush. Thanks, Kaike > > > > > >