Re: Kernel hang caused by commit "can: m_can: Start/Cancel polling timer together with interrupts"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 01.07.24 16:34, Markus Schneider-Pargmann wrote:
> On Mon, Jul 01, 2024 at 02:12:55PM GMT, Linux regression tracking (Thorsten Leemhuis) wrote:
>> [CCing the regression list, as it should be in the loop for regressions:
>> https://docs.kernel.org/admin-guide/reporting-regressions.html]
>>
>> Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting
>> for once, to make this easily accessible to everyone.
>>
>> Hmm, looks like there was not even a single reply to below regression
>> report. But also seens Markus hasn't posted anything archived on Lore
>> since about three weeks now, so he might be on vacation.
>>
>> Marc, do you might have an idea what's wrong with the culprit? Or do we
>> expected Markus to be back in action soon?
> 
> Great, ping here.

Thx for replying!

> @Matthias: Thanks for debugging and sorry for breaking it. If you have a
> fix for this, let me know. I have a lot of work right now, so I am not
> sure when I will have a proper fix ready. But it is on my todo list.

Thx. This made me wonder: is "revert the culprit to resolve this quickly
and reapply it later together with a fix" something that we should
consider if a proper fix takes some time? Or is this not worth it in
this case or extremely hard? Or would it cause a regression on it's own
for users of 6.9?

Ciao, Thorsten

>> On 18.06.24 18:12, Matthias Schiffer wrote:
>>> Hi Markus,
>>>
>>> we've found that recent kernels hang on the TI AM62x SoC (where no m_can interrupt is available and
>>> thus the polling timer is used), always a few seconds after the CAN interfaces are set up.
>>>
>>> I have bisected the issue to commit a163c5761019b ("can: m_can: Start/Cancel polling timer together
>>> with interrupts"). Both master and 6.6 stable (which received a backport of the commit) are
>>> affected. On 6.6 the commit is easy to revert, but on master a lot has happened on top of that
>>> change.
>>>
>>> As far as I can tell, the reason is that hrtimer_cancel() tries to cancel the timer synchronously,
>>> which will deadlock when called from the hrtimer callback itself (hrtimer_callback -> m_can_isr ->
>>> m_can_disable_all_interrupts -> hrtimer_cancel).
>>>
>>> I can try to come up with a fix, but I think you are much more familiar with the driver code. Please
>>> let me know if you need any more information.
>>>
>>> Best regards,
>>> Matthias
>>>
>>>
> 
> 




[Index of Archives]     [Automotive Discussions]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]     [CAN Bus]

  Powered by Linux