RE: [PATCH -v3] PCI, pciehp: make every slot have its own workqueue to avoid deadlock

"Kaneshige, Kenji" <kaneshige.kenji@xxxxxxxxxxxxxx> · Tue, 13 Nov 2012 02:35:24 +0000

> -----Original Message-----
> From: Bjorn Helgaas [mailto:bhelgaas@xxxxxxxxxx]
> Sent: Tuesday, November 13, 2012 9:18 AM
> To: Yijing Wang
> Cc: Kaneshige, Kenji/金重 憲治; Yinghai Lu; Rafael; Rusty Russell; Mauro Carvalho Chehab; Oliver Neukum;
> jiang.liu@xxxxxxxxxx; linux-pci@xxxxxxxxxxxxxxx; Hanjun Guo
> Subject: Re: [PATCH -v3] PCI, pciehp: make every slot have its own workqueue to avoid deadlock
> 
> On Mon, Nov 12, 2012 at 2:17 AM, Yijing Wang <wangyijing@xxxxxxxxxx> wrote:
> > Currently, pciehp use global pciehp_wq to handle hotplug event from hardware.
> > Hot remove path will be blocked if a hotplug slot connected a IO-BOX(composed of PCIe
> > Switch and some slots which support hotplug). The hot removed work was queued into
> > pciehp_wq. But in the hot-remove path, pciehp driver would flush pciehp_wq when
> > the pcie port(support pciehp) was removed. In this case the hot-remove path blocked.
> > This patch remove the global pciehp_wq and create a new workqueue for every slot to
> > avoid above problem.
> 
> Can you include the actual call path that leads to the deadlock?  I
> assume it's related to a PCIe device removal that happens at about the
> same time as the pcie_port_service_unregister(&hpdriver_portdrv)?
> 
> I just want to make sure that the new per-slot workqueue arrangement
> doesn't lead to any case where the slot workqueue item depends on
> something removed by the pcied_cleanup() as the module exits.

Just for your information, here is my understanding of the problem.

====
- Your hardware has nested PCIe hotplug slots like below.

    --<slot A>--<Slot B>

- Hot-removal request (attention button event) comes to <slot A>
- Pciehp driver queue the hot-removal work
- This hot-removal work try to remove <slot B>
- To remove <slot B>, pciehp flush the pciehp_wq, but it never finishes
  because hot-removal work is in progress. ===> deadlock
====

> 
> > --- a/drivers/pci/hotplug/pciehp_hpc.c
> > +++ b/drivers/pci/hotplug/pciehp_hpc.c
> > @@ -773,23 +773,32 @@ static void pcie_shutdown_notification(struct controller *ctrl)
> >  static int pcie_init_slot(struct controller *ctrl)
> >  {
> >         struct slot *slot;
> > +       char name[32];
> >
> >         slot = kzalloc(sizeof(*slot), GFP_KERNEL);
> >         if (!slot)
> >                 return -ENOMEM;
> > +
> > +       snprintf(name, sizeof(name), "pciehp-%u", PSN(ctrl));
> > +       slot->wq = create_singlethread_workqueue(name);
> 
> You're using create_singlethread_workqueue() instead of the previous
> alloc_workqueue().  This is a slight change in semantics since we
> previously called "alloc_workqueue(name, 0, 0)" and
> create_singlethread_workqueue() calls "alloc_workqueue(name,
> WQ_UNBOUND | WQ_MEM_RECLAIM, 1)."
> 
> I don't know whether this is necessary in order to have per-slot
> workqueues or not.  If not, please split it into a separate patch to
> make it more visible.

Good catch.
I think we should use 0 for max_active.
Hot-add/remove operation take a long time. So if we use 1 for max_active,
we cannot handle events during hot-add/remove operation. As a result, for
example, push button event and cancel event that are issued during
hot-add/remove operation might not be handled properly, I think.

Any comments, Yijing?

Regards,
Kenji Kaneshige

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html