By crash, I mean that an error is returned here:
https://kernel.googlesource.com/pub/scm/linux/kernel/git/torvalds/linux.git/+/refs/heads/master/drivers/gpu/drm/drm_edid_load.c#195
I don't really know what happens next, but on my machine the built-in
screen and the external remains dark. Also the kernel seems to freeze.
I suspect a kernel panic, but I'm not sure. Anyway, the error is
definitely not well handled, and a fix would be great.
Also, request_firmware() will crash if called for the first time on the
resume path because the file system isn't reachable on the resume
process. And no cache is available for this firmware. So I guess that
in this case, request_firmware() returns an error.
Suspend-plug-resume case is not my priority nether as long as it
doesn't make the system crash (Which is currently the case).
On Wed, Oct 12 2022 at 11:25:59 AM +0300, Jani Nikula
<jani.nikula@xxxxxxxxx> wrote:
On Tue, 11 Oct 2022, Matthieu CHARETTE <matthieu.charette@xxxxxxxxx>
wrote:
Currently the EDID is requested during the resume. But since it's
requested too early, this means before the filesystem is mounted,
the
firmware request fails. This make the DRM driver crash when
resuming.
This kind of issue should be prevented by the firmware caching
process
which cache every firmware requested for the next resume. But since
we
are using a temporary device, the firmware isn't cached on suspend
since the device doesn't work anymore.
When using a non temporary device to get the EDID, the firmware will
be cached on suspend for the next resume. So requesting the firmware
during resume will succeed.
But if the firmware has never been requested since the boot, this
means that the monitor isn't plugged since the boot. The kernel will
not be caching the EDID. So if we plug the monitor while the machine
is suspended. The resume will fail to load the firmware. And the DRM
driver will crash.
So basically, your fix should solve the issue except for the case
where the monitor hasn't been plugged since boot and is plugged
while
the machine is suspended.
I hope I was clear. Tell me if I wasn't. I'm not really good at
explaining.
That was a pretty good explanation. The only thing I'm missing is what
the failure mode is exactly when you claim the driver will crash. Why
would request_firmware() "crash" if called for the first time on the
resume path?
I'm not sure I care much about not being able to load the firmware
EDID
in the suspend-plug-resume case (as this can be remedied with a
subsequent modeset), but obviously any errors need to be handled
gracefully, without crashing.
BR,
Jani.
--
Jani Nikula, Intel Open Source Graphics Center