Re: Non-deterministically boot into dark screen with `amdgpu`

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi guys,

Am 10.08.20 um 08:43 schrieb Alexander Monakov:
Hi,

you should Сс a specialized mailing list and a relevant maintainer,
otherwise your email is likely to be ignored as LKML is an incredibly
high-volume list. Adding amd-gfx and Alex Deucher.

Thanks for forwarding this. AFAIK we haven't heard of this bug before, but Alex already might know more about it.

More thoughts below.

On Sun, 9 Aug 2020, Ignat Insarov wrote:

Hello!

This is an issue report. I am not familiar with the Linux kernel
development procedure, so please direct me to a more appropriate or
specialized medium if this is not the right avenue.

My laptop (Ryzen 7 Pro CPU/GPU) boots into dark screen more often than
not. Screen blackness correlates with a line in the `systemd` journal
that says `RAM width Nbits DDR4`, where N is either 128 (resulting in
dark screen) or 64 (resulting in a healthy boot). The number seems to
be chosen at random with bias towards 128. This has been going on for
a while so here is some statistics:

* 356 boots proceed far enough to  attempt mode setting.
* 82 boots set RAM width to 64 bits and presumably succeed.
* 274 boots set RAM width to 128 bits and presumably fail.

The issue is prevented with the `nomodeset` kernel option.

I reported this previously (about a year ago) on the forum of my Linux
distribution.[1] The issue still persists as of  linux 5.8.0.

The details of my graphics controller, as well as some journal
excerpts, can be seen at [1]. One thing that has changed since then is
that on failure, there now appears a null pointer dereference error. I
am attaching the log of kernel messages from the most recent failed
boot — please request more information if needed.

I appreciate any directions and advice as to how I may go about fixing
this annoyance.

[1]: https://bbs.archlinux.org/viewtopic.php?id=248273

On the forum you show that in the "success" case there's one less "BIOS
signature incorrect" message. This implies that amdgpu_get_bios() in
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/amd/amdgpu/amdgpu_bios.c
gets the video BIOS from a different source. If that happens every time
(one "signature incorrect" message for "success", two for "failure")
that may be relevant to the problem you're experiencing.

If you don't mind patching and rebuilding the kernel I suggest adding
debug printks to the aforementioned function to see exactly which methods
fail with wrong signature and which succeeds.

Also might be worthwhile to check if there's a BIOS update for your laptop.

It might also be a good idea to try the latest amd-staging-drm-next branch from Alex repository (bear with me I don't have the link at hand, but it should be easy to find).

Opening a bug report or searching the existing ones for something similar under https://gitlab.freedesktop.org/drm/amd/-/issues might be a good idea as well.

And I completely agree that this sounds like an issue getting the BIOS image.

Thanks,
Christian.


Alexander

_______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

_______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux