On Fri, 26 Jan 2024, Mario Limonciello <mario.limonciello@xxxxxxx> wrote: > On 1/26/2024 10:28, Melissa Wen wrote: >> Hi, >> >> I'm debugging a null-pointer dereference when running >> igt@kms_connector_force_edid and the way I found to solve the bug is to >> stop using raw edid handler in amdgpu_connector_funcs_force and >> create_eml_sink in favor of managing resouces via sruct drm_edid helpers >> (Patch 1). The proper solution seems to be switch amdgpu_dm_connector >> from struct edid to struct drm_edid and avoid the usage of different >> approaches in the driver (Patch 2). However, doing it implies a good >> amount of work and validation, therefore I decided to send this RFC >> first to collect opinions and check if there is any parallel work on >> this side. It's a working in progress. >> >> The null-pointer error trigger by the igt@kms_connector_force_edid test >> was introduced by: >> - e54ed41620f ("drm/amd/display: Remove unwanted drm edid references") >> >> You can check the error trace in the first patch. >> >> This series was tested with kms_hdmi_inject and kms_force_connector. No >> null-pointer error, kms_hdmi_inject is successul and kms_force_connector >> is sucessful after the second execution - the force-edid subtest >> still fails in the first run (I'm still investigating). >> >> There is also a couple of cast warnings to be addressed - I'm looking >> for the best replacement. >> >> I appreciate any feedback and testing. > > So I'm actually a little bit worried by hardcoding EDID_LENGTH in this > series. > > I have some other patches that I'm posting later on that let you get the > EDID from _DDC BIOS method too. My observation was that the EDID can be > anywhere up to 512 bytes according to the ACPI spec. > > An earlier version of my patch was using EDID_LENGTH when fetching it > and the EDID checksum failed. > > I'll CC you on the post, we probably want to get your changes and mine > merged together. One of the main points of struct drm_edid is that it tracks the allocation size separately. We should simply not trust edid->extensions, because most of the time it originates from outside the kernel. Using drm_edid and immediately drm_edid_raw() falls short. That function should only be used during migration to help. And yeah, it also means EDID parsing should be done in drm_edid.c, and not spread out all over the subsystem. BR, Jani. > >> >> Melissa >> >> Melissa Wen (2): >> drm/amd/display: fix null-pointer dereference on edid reading >> drm/amd/display: switch amdgpu_dm_connector to use struct drm_edid >> >> .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 78 ++++++++++--------- >> .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h | 4 +- >> .../amd/display/amdgpu_dm/amdgpu_dm_helpers.c | 9 ++- >> .../display/amdgpu_dm/amdgpu_dm_mst_types.c | 23 +++--- >> 4 files changed, 60 insertions(+), 54 deletions(-) >> > -- Jani Nikula, Intel