On 2012/11/13 10:35, Kaneshige, Kenji wrote: >> -----Original Message----- >> From: Bjorn Helgaas [mailto:bhelgaas@xxxxxxxxxx] >> Sent: Tuesday, November 13, 2012 9:18 AM >> To: Yijing Wang >> Cc: Kaneshige, Kenji/金重 憲治; Yinghai Lu; Rafael; Rusty Russell; Mauro Carvalho Chehab; Oliver Neukum; >> jiang.liu@xxxxxxxxxx; linux-pci@xxxxxxxxxxxxxxx; Hanjun Guo >> Subject: Re: [PATCH -v3] PCI, pciehp: make every slot have its own workqueue to avoid deadlock >> >> On Mon, Nov 12, 2012 at 2:17 AM, Yijing Wang <wangyijing@xxxxxxxxxx> wrote: >>> Currently, pciehp use global pciehp_wq to handle hotplug event from hardware. >>> Hot remove path will be blocked if a hotplug slot connected a IO-BOX(composed of PCIe >>> Switch and some slots which support hotplug). The hot removed work was queued into >>> pciehp_wq. But in the hot-remove path, pciehp driver would flush pciehp_wq when >>> the pcie port(support pciehp) was removed. In this case the hot-remove path blocked. >>> This patch remove the global pciehp_wq and create a new workqueue for every slot to >>> avoid above problem. >> >> Can you include the actual call path that leads to the deadlock? I >> assume it's related to a PCIe device removal that happens at about the >> same time as the pcie_port_service_unregister(&hpdriver_portdrv)? >> >> I just want to make sure that the new per-slot workqueue arrangement >> doesn't lead to any case where the slot workqueue item depends on >> something removed by the pcied_cleanup() as the module exits. > > Just for your information, here is my understanding of the problem. > > ==== > - Your hardware has nested PCIe hotplug slots like below. > > --<slot A>--<Slot B> > > - Hot-removal request (attention button event) comes to <slot A> > - Pciehp driver queue the hot-removal work > - This hot-removal work try to remove <slot B> > - To remove <slot B>, pciehp flush the pciehp_wq, but it never finishes > because hot-removal work is in progress. ===> deadlock > ==== > Yes, the deadlock situation as Kaneshige's description above, I will add deadlock call path description into patch. >> >>> --- a/drivers/pci/hotplug/pciehp_hpc.c >>> +++ b/drivers/pci/hotplug/pciehp_hpc.c >>> @@ -773,23 +773,32 @@ static void pcie_shutdown_notification(struct controller *ctrl) >>> static int pcie_init_slot(struct controller *ctrl) >>> { >>> struct slot *slot; >>> + char name[32]; >>> >>> slot = kzalloc(sizeof(*slot), GFP_KERNEL); >>> if (!slot) >>> return -ENOMEM; >>> + >>> + snprintf(name, sizeof(name), "pciehp-%u", PSN(ctrl)); >>> + slot->wq = create_singlethread_workqueue(name); >> >> You're using create_singlethread_workqueue() instead of the previous >> alloc_workqueue(). This is a slight change in semantics since we >> previously called "alloc_workqueue(name, 0, 0)" and >> create_singlethread_workqueue() calls "alloc_workqueue(name, >> WQ_UNBOUND | WQ_MEM_RECLAIM, 1)." >> >> I don't know whether this is necessary in order to have per-slot >> workqueues or not. If not, please split it into a separate patch to >> make it more visible. > > Good catch. > I think we should use 0 for max_active. > Hot-add/remove operation take a long time. So if we use 1 for max_active, > we cannot handle events during hot-add/remove operation. As a result, for > example, push button event and cancel event that are issued during > hot-add/remove operation might not be handled properly, I think. > > Any comments, Yijing? This is an issue, I will test this case in my hot-plug machine and confirm it. Thanks! Yijing > > Regards, > Kenji Kaneshige > > > . > -- Thanks! Yijing -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html