Re: [PATCH -v3] PCI, pciehp: make every slot have its own workqueue to avoid deadlock

Yijing Wang <wangyijing@xxxxxxxxxx> · Tue, 13 Nov 2012 10:53:40 +0800

On 2012/11/13 10:35, Kaneshige, Kenji wrote:
>> -----Original Message-----
>> From: Bjorn Helgaas [mailto:bhelgaas@xxxxxxxxxx]
>> Sent: Tuesday, November 13, 2012 9:18 AM
>> To: Yijing Wang
>> Cc: Kaneshige, Kenji/金重 憲治; Yinghai Lu; Rafael; Rusty Russell; Mauro Carvalho Chehab; Oliver Neukum;
>> jiang.liu@xxxxxxxxxx; linux-pci@xxxxxxxxxxxxxxx; Hanjun Guo
>> Subject: Re: [PATCH -v3] PCI, pciehp: make every slot have its own workqueue to avoid deadlock
>>
>> On Mon, Nov 12, 2012 at 2:17 AM, Yijing Wang <wangyijing@xxxxxxxxxx> wrote:
>>> Currently, pciehp use global pciehp_wq to handle hotplug event from hardware.
>>> Hot remove path will be blocked if a hotplug slot connected a IO-BOX(composed of PCIe
>>> Switch and some slots which support hotplug). The hot removed work was queued into
>>> pciehp_wq. But in the hot-remove path, pciehp driver would flush pciehp_wq when
>>> the pcie port(support pciehp) was removed. In this case the hot-remove path blocked.
>>> This patch remove the global pciehp_wq and create a new workqueue for every slot to
>>> avoid above problem.
>>
>> Can you include the actual call path that leads to the deadlock?  I
>> assume it's related to a PCIe device removal that happens at about the
>> same time as the pcie_port_service_unregister(&hpdriver_portdrv)?
>>
>> I just want to make sure that the new per-slot workqueue arrangement
>> doesn't lead to any case where the slot workqueue item depends on
>> something removed by the pcied_cleanup() as the module exits.
> 
> Just for your information, here is my understanding of the problem.
> 
> ====
> - Your hardware has nested PCIe hotplug slots like below.
> 
>     --<slot A>--<Slot B>
> 
> - Hot-removal request (attention button event) comes to <slot A>
> - Pciehp driver queue the hot-removal work
> - This hot-removal work try to remove <slot B>
> - To remove <slot B>, pciehp flush the pciehp_wq, but it never finishes
>   because hot-removal work is in progress. ===> deadlock
> ====
> 

Yes, the deadlock situation as Kaneshige's description above, I will add deadlock call path
description into patch.

>>
>>> --- a/drivers/pci/hotplug/pciehp_hpc.c
>>> +++ b/drivers/pci/hotplug/pciehp_hpc.c
>>> @@ -773,23 +773,32 @@ static void pcie_shutdown_notification(struct controller *ctrl)
>>>  static int pcie_init_slot(struct controller *ctrl)
>>>  {
>>>         struct slot *slot;
>>> +       char name[32];
>>>
>>>         slot = kzalloc(sizeof(*slot), GFP_KERNEL);
>>>         if (!slot)
>>>                 return -ENOMEM;
>>> +
>>> +       snprintf(name, sizeof(name), "pciehp-%u", PSN(ctrl));
>>> +       slot->wq = create_singlethread_workqueue(name);
>>
>> You're using create_singlethread_workqueue() instead of the previous
>> alloc_workqueue().  This is a slight change in semantics since we
>> previously called "alloc_workqueue(name, 0, 0)" and
>> create_singlethread_workqueue() calls "alloc_workqueue(name,
>> WQ_UNBOUND | WQ_MEM_RECLAIM, 1)."
>>
>> I don't know whether this is necessary in order to have per-slot
>> workqueues or not.  If not, please split it into a separate patch to
>> make it more visible.
> 
> Good catch.
> I think we should use 0 for max_active.
> Hot-add/remove operation take a long time. So if we use 1 for max_active,
> we cannot handle events during hot-add/remove operation. As a result, for
> example, push button event and cancel event that are issued during
> hot-add/remove operation might not be handled properly, I think.
> 
> Any comments, Yijing?

This is an issue, I will test this case in my hot-plug machine and confirm it.

Thanks!
Yijing

> 
> Regards,
> Kenji Kaneshige
> 
> 
> .
> 

-- 
Thanks!
Yijing

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html