> -----Original Message----- > From: Bjorn Helgaas [mailto:bhelgaas@xxxxxxxxxx] > Sent: Tuesday, November 13, 2012 9:18 AM > To: Yijing Wang > Cc: Kaneshige, Kenji/金重 憲治; Yinghai Lu; Rafael; Rusty Russell; Mauro Carvalho Chehab; Oliver Neukum; > jiang.liu@xxxxxxxxxx; linux-pci@xxxxxxxxxxxxxxx; Hanjun Guo > Subject: Re: [PATCH -v3] PCI, pciehp: make every slot have its own workqueue to avoid deadlock > > On Mon, Nov 12, 2012 at 2:17 AM, Yijing Wang <wangyijing@xxxxxxxxxx> wrote: > > Currently, pciehp use global pciehp_wq to handle hotplug event from hardware. > > Hot remove path will be blocked if a hotplug slot connected a IO-BOX(composed of PCIe > > Switch and some slots which support hotplug). The hot removed work was queued into > > pciehp_wq. But in the hot-remove path, pciehp driver would flush pciehp_wq when > > the pcie port(support pciehp) was removed. In this case the hot-remove path blocked. > > This patch remove the global pciehp_wq and create a new workqueue for every slot to > > avoid above problem. > > Can you include the actual call path that leads to the deadlock? I > assume it's related to a PCIe device removal that happens at about the > same time as the pcie_port_service_unregister(&hpdriver_portdrv)? > > I just want to make sure that the new per-slot workqueue arrangement > doesn't lead to any case where the slot workqueue item depends on > something removed by the pcied_cleanup() as the module exits. Just for your information, here is my understanding of the problem. ==== - Your hardware has nested PCIe hotplug slots like below. --<slot A>--<Slot B> - Hot-removal request (attention button event) comes to <slot A> - Pciehp driver queue the hot-removal work - This hot-removal work try to remove <slot B> - To remove <slot B>, pciehp flush the pciehp_wq, but it never finishes because hot-removal work is in progress. ===> deadlock ==== > > > --- a/drivers/pci/hotplug/pciehp_hpc.c > > +++ b/drivers/pci/hotplug/pciehp_hpc.c > > @@ -773,23 +773,32 @@ static void pcie_shutdown_notification(struct controller *ctrl) > > static int pcie_init_slot(struct controller *ctrl) > > { > > struct slot *slot; > > + char name[32]; > > > > slot = kzalloc(sizeof(*slot), GFP_KERNEL); > > if (!slot) > > return -ENOMEM; > > + > > + snprintf(name, sizeof(name), "pciehp-%u", PSN(ctrl)); > > + slot->wq = create_singlethread_workqueue(name); > > You're using create_singlethread_workqueue() instead of the previous > alloc_workqueue(). This is a slight change in semantics since we > previously called "alloc_workqueue(name, 0, 0)" and > create_singlethread_workqueue() calls "alloc_workqueue(name, > WQ_UNBOUND | WQ_MEM_RECLAIM, 1)." > > I don't know whether this is necessary in order to have per-slot > workqueues or not. If not, please split it into a separate patch to > make it more visible. Good catch. I think we should use 0 for max_active. Hot-add/remove operation take a long time. So if we use 1 for max_active, we cannot handle events during hot-add/remove operation. As a result, for example, push button event and cancel event that are issued during hot-add/remove operation might not be handled properly, I think. Any comments, Yijing? Regards, Kenji Kaneshige -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html