On 01/29, Jani Nikula wrote: > On Fri, 26 Jan 2024, Mario Limonciello <mario.limonciello@xxxxxxx> wrote: > > On 1/26/2024 10:28, Melissa Wen wrote: > >> Hi, > >> > >> I'm debugging a null-pointer dereference when running > >> igt@kms_connector_force_edid and the way I found to solve the bug is to > >> stop using raw edid handler in amdgpu_connector_funcs_force and > >> create_eml_sink in favor of managing resouces via sruct drm_edid helpers > >> (Patch 1). The proper solution seems to be switch amdgpu_dm_connector > >> from struct edid to struct drm_edid and avoid the usage of different > >> approaches in the driver (Patch 2). However, doing it implies a good > >> amount of work and validation, therefore I decided to send this RFC > >> first to collect opinions and check if there is any parallel work on > >> this side. It's a working in progress. > >> > >> The null-pointer error trigger by the igt@kms_connector_force_edid test > >> was introduced by: > >> - e54ed41620f ("drm/amd/display: Remove unwanted drm edid references") > >> > >> You can check the error trace in the first patch. > >> > >> This series was tested with kms_hdmi_inject and kms_force_connector. No > >> null-pointer error, kms_hdmi_inject is successul and kms_force_connector > >> is sucessful after the second execution - the force-edid subtest > >> still fails in the first run (I'm still investigating). > >> > >> There is also a couple of cast warnings to be addressed - I'm looking > >> for the best replacement. > >> > >> I appreciate any feedback and testing. > > > > So I'm actually a little bit worried by hardcoding EDID_LENGTH in this > > series. > > > > I have some other patches that I'm posting later on that let you get the > > EDID from _DDC BIOS method too. My observation was that the EDID can be > > anywhere up to 512 bytes according to the ACPI spec. > > > > An earlier version of my patch was using EDID_LENGTH when fetching it > > and the EDID checksum failed. > > > > I'll CC you on the post, we probably want to get your changes and mine > > merged together. > > One of the main points of struct drm_edid is that it tracks the > allocation size separately. > > We should simply not trust edid->extensions, because most of the time it > originates from outside the kernel. > > Using drm_edid and immediately drm_edid_raw() falls short. That function > should only be used during migration to help. And yeah, it also means > EDID parsing should be done in drm_edid.c, and not spread out all over > the subsystem. Hi Mario and Jani, Thanks for the feedback. I agree with you. I used the drm_edid_raw() as an intermediate step to assess/validate this migration, but I'll work on removing this hack. So, I understand that the transition from edid to drm_edid is the right path, so I'll improve this work (keeping the points you raised in mind) and send a version. BR, Melissa > > > BR, > Jani. > > > > > >> > >> Melissa > >> > >> Melissa Wen (2): > >> drm/amd/display: fix null-pointer dereference on edid reading > >> drm/amd/display: switch amdgpu_dm_connector to use struct drm_edid > >> > >> .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 78 ++++++++++--------- > >> .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h | 4 +- > >> .../amd/display/amdgpu_dm/amdgpu_dm_helpers.c | 9 ++- > >> .../display/amdgpu_dm/amdgpu_dm_mst_types.c | 23 +++--- > >> 4 files changed, 60 insertions(+), 54 deletions(-) > >> > > > > -- > Jani Nikula, Intel