Re: Question about deadlock between AER and pceihp interrupts during resume from S3 with unplugged device

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On 6/14/22 11:07 AM, Andrey Grodzovsky wrote:
> Just a gentle ping, also - I updated the ticket https://bugzilla.kernel.org/show_bug.cgi?id=215590
> 
> with the workaround we did if this could help you to advise us
> what would be a generic solution for this ?
> 
> Andrey
Can you explain your WA? It seems to be unrelated to deadlock issue
discussed in this thread. Are they related?

> 
> On 2022-06-10 17:25, Andrey Grodzovsky wrote:
>>
>>
>> On 2022-02-10 09:39, Andrey Grodzovsky wrote:
>>> Thanks a lot for quick response, we will give this a try.
>>>
>>> Andrey
>>>
>>> On 2022-02-10 01:23, Lukas Wunner wrote:
>>>> On Wed, Feb 09, 2022 at 02:54:06PM -0500, Andrey Grodzovsky wrote:
>>>>> Hi, on kernel based on 5.4.2 we are observing a deadlock between
>>>>> reset_lock semaphore and device_lock (dev->mutex). The scenario
>>>>> we do is putting the system to sleep, disconnecting the eGPU
>>>>> from the PCIe bus (through a special SBIOS setting) or by simply
>>>>> removing power to external PCIe cage and waking the
>>>>> system up.
>>>>>
>>>>> I attached the log. Please advise if you have any idea how
>>>>> to work around it ? Since the kernel is old, does anyone
>>>>> have an idea if this issue is known and already solved in later kernels ?
>>>>> We cannot try with latest since our kernel is custom for that platform.
>>>>
>>>> It is a known issue.  Here's a fix I submitted during the v5.9 cycle:
>>>>
>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flinux-pci%2F908047f7699d9de9ec2efd6b79aa752d73dab4b6.1595329748.git.lukas%40wunner.de%2F&data=04%7C01%7Candrey.grodzovsky%40amd.com%7Cba698967471548d739c108d9ec5dcf6c%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637800710411446272%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=hrRVL77%2FNRvojfG2WDamDLO5dsqn3Cv6XxNbP0eGum0%3D&reserved=0
>>>>
>>>> The fix hasn't been applied yet.  I think I need to rework the patch,
>>>> just haven't found the time.
>>
>> Hey Lucas - just checking again if you had a chance to push this change
>> through ? It's essential to us in one of our costumer projects so we
>> wonder if have any estimate when will it be up-streamed and if we can
>> help with this. We would also need backporting this back to 5.11 and 5.4
>> kernels after it's upstreamed.
>>
>> Another point I want to mention is that this patch has a negative
>> side effect on plug back times - it causes a regression point for the delay to light-up display at resume time related to back-ported AER
>>
>> Anatoli is working on resolving this and so maybe he can add his
>> comment here and maybe you can help him with proper resolution for this.
>>
>> Andrey
>>
>>>>
>>>> Since the trigger in your case are AER-handled errors during a
>>>> system sleep transition, you may also want to consider the
>>>> following 2-patch series by Kai-Heng Feng which is currently
>>>> under discussion:
>>>>
>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flinux-pci%2F20220127025418.1989642-1-kai.heng.feng%40canonical.com%2F&data=04%7C01%7Candrey.grodzovsky%40amd.com%7Cba698967471548d739c108d9ec5dcf6c%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637800710411446272%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=tnLUa6J%2FLqFrlm4CfZ9l26io0bOQ7ip30d26ax05st4%3D&reserved=0
>>>>
>>>> That series disables AER during a system sleep transition and
>>>> should thus prevent the flood of AER-handled errors you're seeing.
>>>> Once AER is disabled, the reset-induced deadlocks should go away as well.
>>>>
>>>> Thanks,
>>>>
>>>> Lukas

-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer



[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux