Re: [Bug 215315] New: [REGRESSION BISECTED] amdgpu crashes system suspend - NUC8i7HVKVA

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[TLDR: adding this regression to regzbot; most of this mail is compiled
from a few templates paragraphs some of you might have seen already.]

Hi, this is your Linux kernel regression tracker speaking.

Top-posting for once, to make this easy accessible to everyone.

Thanks for the report.

Adding the regression mailing list to the list of recipients, as it
should be in the loop for all regressions, as explained here:
https://www.kernel.org/doc/html/latest/admin-guide/reporting-issues.html

To be sure this issue doesn't fall through the cracks unnoticed, I'm
adding it to regzbot, my Linux kernel regression tracking bot:

#regzbot ^introduced f7d6779df642720e22bffd449e683bb8690bd3bf
#regzbot title drm: amdgpu: NUC8i7HVKVA crashes during system suspend
#regzbot link: https://bugzilla.kernel.org/show_bug.cgi?id=215315
#regzbot ignore-activity

Reminder: when fixing the issue, please add a 'Link:' tag with the URL
to the report (the parent of this mail), then regzbot will automatically
mark the regression as resolved once the fix lands in the appropriate
tree. For more details about regzbot see footer.

Sending this to everyone that got the initial report, to make all aware
of the tracking. I also hope that messages like this motivate people to
directly get at least the regression mailing list and ideally even
regzbot involved when dealing with regressions, as messages like this
wouldn't be needed then.

Don't worry, I'll send further messages wrt to this regression just to
the lists (with a tag in the subject so people can filter them away), as
long as they are intended just for regzbot. With a bit of luck no such
messages will be needed anyway.

Ciao, Thorsten (wearing his 'Linux kernel regression tracker' hat).

P.S.: As a Linux kernel regression tracker I'm getting a lot of reports
on my table. I can only look briefly into most of them. Unfortunately
therefore I sometimes will get things wrong or miss something important.
I hope that's not the case here; if you think it is, don't hesitate to
tell me about it in a public reply. That's in everyone's interest, as
what I wrote above might be misleading to everyone reading this; any
suggestion I gave thus might sent someone reading this down the wrong
rabbit hole, which none of us wants.

BTW, I have no personal interest in this issue, which is tracked using
regzbot, my Linux kernel regression tracking bot
(https://linux-regtracking.leemhuis.info/regzbot/). I'm only posting
this mail to get things rolling again and hence don't need to be CC on
all further activities wrt to this regression.


On 13.12.21 00:08, bugzilla-daemon@xxxxxxxxxxxxxxxxxxx wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=215315
> 
>             Bug ID: 215315
>            Summary: [REGRESSION BISECTED] amdgpu crashes system suspend -
>                     NUC8i7HVKVA
>            Product: Drivers
>            Version: 2.5
>     Kernel Version: 5.15-rc1, 5.15, 5.16-rc4
>           Hardware: x86-64
>                 OS: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: Video(DRI - non Intel)
>           Assignee: drivers_video-dri@xxxxxxxxxxxxxxxxxxxx
>           Reporter: lenb@xxxxxxxxxx
>         Regression: No
> 
> My Intel NUC8i7HVKVA has an AMD GPU.
> 
> Until 5.15-rc1, this machine was rock solid in suspend stress testing -- never
> crashing after hundreds of hours of back-to-back suspend cycles.
> 
> Until this patch went upstream:
> 
> commit f7d6779df642720e22bffd449e683bb8690bd3bf (refs/bisect/bad)
> Author: Guchun Chen <guchun.chen@xxxxxxx>
> Date:   Fri Aug 27 18:31:41 2021 +0800
> 
>     drm/amdgpu: stop scheduler when calling hw_fini (v2)
> 
>     This gurantees no more work on the ring can be submitted
>     to hardware in suspend/resume case, otherwise a potential
>     race will occur and the ring will get no chance to stay
>     empty before suspend.
> 
>     v2: Call drm_sched_resubmit_job before drm_sched_start to
>     restart jobs from the pending list.
> 
>     Suggested-by: Andrey Grodzovsky <andrey.grodzovsky@xxxxxxx>
>     Suggested-by: Christian König <christian.koenig@xxxxxxx>
>     Signed-off-by: Guchun Chen <guchun.chen@xxxxxxx>
>     Reviewed-by: Christian König <christian.koenig@xxxxxxx>
>     Signed-off-by: Alex Deucher <alexander.deucher@xxxxxxx>
>     Cc: stable@xxxxxxxxxxxxxxx
> 
> I bisected that the patch before this one was integrated can handle over 1,000
> back-to-back "freeze" system suspend cycles.  Yet, when this patch is present,
> the system may crash before it completes only 100 cycles, and at most lasts a
> few hundred cycles.
> 
> This crash is present in all following upstream rc's, including 5.15-rc4.
> 
> When I revert this patch from 5.15-rc4, stability returns.
> 
> Usually, the crash is manifest by a black screen, and a system that does not
> respond to ping, and will only respond to a long AC power button press to
> remove power; and a subsequent cold reboot.
> 
> I have witnessed the crash occur, and the "ubuntu color themed" screen enters
> some sort of reverse video mode.  In this weird color mode, I've seen a text
> window oscillate between scrolling and un-scrolling for a line -- sort of like
> it is going back in time, but then changes its mind.  There is no response to
> keyboard, mouse, or network input.
> 




[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux