RE: [PATCH for-rc or next 1/3] IB/hfi1: Do not destroy hfi1_wq when the device is shut down

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> -----Original Message-----
> From: linux-rdma-owner@xxxxxxxxxxxxxxx <linux-rdma-
> owner@xxxxxxxxxxxxxxx> On Behalf Of Leon Romanovsky
> Sent: Wednesday, May 13, 2020 3:59 AM
> To: Wan, Kaike <kaike.wan@xxxxxxxxx>
> Cc: Dalessandro, Dennis <dennis.dalessandro@xxxxxxxxx>; jgg@xxxxxxxx;
> dledford@xxxxxxxxxx; linux-rdma@xxxxxxxxxxxxxxx; Marciniszyn, Mike
> <mike.marciniszyn@xxxxxxxxx>; stable@xxxxxxxxxxxxxxx
> Subject: Re: [PATCH for-rc or next 1/3] IB/hfi1: Do not destroy hfi1_wq when
> the device is shut down
> 
> On Tue, May 12, 2020 at 11:52:34AM +0000, Wan, Kaike wrote:
> >
> >
> > > -----Original Message-----
> > > From: linux-rdma-owner@xxxxxxxxxxxxxxx <linux-rdma-
> > > owner@xxxxxxxxxxxxxxx> On Behalf Of Leon Romanovsky
> > > Sent: Tuesday, May 12, 2020 1:55 AM
> > > To: Dalessandro, Dennis <dennis.dalessandro@xxxxxxxxx>
> > > Cc: jgg@xxxxxxxx; dledford@xxxxxxxxxx; linux-rdma@xxxxxxxxxxxxxxx;
> > > Marciniszyn, Mike <mike.marciniszyn@xxxxxxxxx>;
> > > stable@xxxxxxxxxxxxxxx; Wan, Kaike <kaike.wan@xxxxxxxxx>
> > > Subject: Re: [PATCH for-rc or next 1/3] IB/hfi1: Do not destroy
> > > hfi1_wq when the device is shut down
> > >
> > > On Mon, May 11, 2020 at 11:13:15PM -0400, Dennis Dalessandro wrote:
> > > > From: Kaike Wan <kaike.wan@xxxxxxxxx>
> > > >
> > > > The workqueue hfi1_wq is destroyed in function shutdown_device(),
> > > > which is called by either shutdown_one() or remove_one(). The
> > > > function
> > > > shutdown_one() is called when the kernel is rebooted while
> > > > remove_one() is called when the hfi1 driver is unloaded. When the
> > > > kernel is rebooted, hfi1_wq is destroyed while all qps are still
> > > > active, leading to a kernel crash:
> > >
> > > I was under impression that kernel reboot should follow same logic
> > > as module removal. This is what graceful reboot will do anyway. Can
> > > you please give me a link where I can read about difference in those
> flows?
> > >
> > I used to think the same. However, by adding traces to the hfi driver, I
> found out that the shutdown function of the pci_driver was called when
> typing "reboot"  while the remove function  of the pci_driver was called
> when typing "modprobe -r hfi1".
> 
> I took a look on what mlx5_core is doing in shutdown flow and it can be
> summarized in the following:
> 1. Drain workqueues
> 2. Close PCI
> 3. Don't release anything.
> 
> So maybe you didn't flush the hfi1_wq?
Will add the flush.

Thanks,

Kaike
> >
> >
> >




[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux