[AMD Official Use Only] Hi Harry. > -----Original Message----- > From: Wentland, Harry <Harry.Wentland@xxxxxxx> > Sent: Wednesday, November 10, 2021 11:32 PM > To: Yuan, Perry <Perry.Yuan@xxxxxxx>; Jani Nikula > <jani.nikula@xxxxxxxxxxxxxxx>; Maarten Lankhorst > <maarten.lankhorst@xxxxxxxxxxxxxxx>; Maxime Ripard <mripard@xxxxxxxxxx>; > Thomas Zimmermann <tzimmermann@xxxxxxx>; David Airlie <airlied@xxxxxxxx>; > Daniel Vetter <daniel@xxxxxxxx> > Cc: Huang, Shimmer <Xinmei.Huang@xxxxxxx>; Huang, Ray > <Ray.Huang@xxxxxxx>; linux-kernel@xxxxxxxxxxxxxxx; dri- > devel@xxxxxxxxxxxxxxxxxxxxx; Limonciello, Mario <Mario.Limonciello@xxxxxxx> > Subject: Re: [PATCH v2] drm/dp: Fix aux->transfer NULL pointer dereference on > drm_dp_dpcd_access > > On 2021-11-05 03:35, Yuan, Perry wrote: > > [AMD Official Use Only] > > > > Hi Jani: > > > > > >> -----Original Message----- > >> From: Jani Nikula <jani.nikula@xxxxxxxxxxxxxxx> > >> Sent: Wednesday, November 3, 2021 7:31 PM > >> To: Yuan, Perry <Perry.Yuan@xxxxxxx>; Maarten Lankhorst > >> <maarten.lankhorst@xxxxxxxxxxxxxxx>; Maxime Ripard > >> <mripard@xxxxxxxxxx>; Thomas Zimmermann <tzimmermann@xxxxxxx>; > David > >> Airlie <airlied@xxxxxxxx>; Daniel Vetter <daniel@xxxxxxxx> > >> Cc: dri-devel@xxxxxxxxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; > >> Huang, Shimmer <Xinmei.Huang@xxxxxxx>; Huang, Ray > <Ray.Huang@xxxxxxx> > >> Subject: RE: [PATCH v2] drm/dp: Fix aux->transfer NULL pointer > >> dereference on drm_dp_dpcd_access > >> > >> [CAUTION: External Email] > >> > >> On Wed, 03 Nov 2021, "Yuan, Perry" <Perry.Yuan@xxxxxxx> wrote: > >>> [AMD Official Use Only] > >>> > >>> Hi Jani: > >>> > >>>> -----Original Message----- > >>>> From: Jani Nikula <jani.nikula@xxxxxxxxxxxxxxx> > >>>> Sent: Tuesday, November 2, 2021 4:40 PM > >>>> To: Yuan, Perry <Perry.Yuan@xxxxxxx>; Maarten Lankhorst > >>>> <maarten.lankhorst@xxxxxxxxxxxxxxx>; Maxime Ripard > >>>> <mripard@xxxxxxxxxx>; Thomas Zimmermann <tzimmermann@xxxxxxx>; > >> David > >>>> Airlie <airlied@xxxxxxxx>; Daniel Vetter <daniel@xxxxxxxx> > >>>> Cc: dri-devel@xxxxxxxxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; > >>>> Huang, Shimmer <Xinmei.Huang@xxxxxxx>; Huang, Ray > >> <Ray.Huang@xxxxxxx> > >>>> Subject: RE: [PATCH v2] drm/dp: Fix aux->transfer NULL pointer > >>>> dereference on drm_dp_dpcd_access > >>>> > >>>> [CAUTION: External Email] > >>>> > >>>> On Tue, 02 Nov 2021, "Yuan, Perry" <Perry.Yuan@xxxxxxx> wrote: > >>>>> [AMD Official Use Only] > >>>>> > >>>>> Hi Jani: > >>>>> Thanks for your comments. > >>>>> > >>>>>> -----Original Message----- > >>>>>> From: Jani Nikula <jani.nikula@xxxxxxxxxxxxxxx> > >>>>>> Sent: Monday, November 1, 2021 9:07 PM > >>>>>> To: Yuan, Perry <Perry.Yuan@xxxxxxx>; Maarten Lankhorst > >>>>>> <maarten.lankhorst@xxxxxxxxxxxxxxx>; Maxime Ripard > >>>>>> <mripard@xxxxxxxxxx>; Thomas Zimmermann > >> <tzimmermann@xxxxxxx>; > >>>> David > >>>>>> Airlie <airlied@xxxxxxxx>; Daniel Vetter <daniel@xxxxxxxx> > >>>>>> Cc: Yuan, Perry <Perry.Yuan@xxxxxxx>; > >>>>>> dri-devel@xxxxxxxxxxxxxxxxxxxxx; linux- kernel@xxxxxxxxxxxxxxx; > >>>>>> Huang, Shimmer <Xinmei.Huang@xxxxxxx>; Huang, Ray > >>>> <Ray.Huang@xxxxxxx> > >>>>>> Subject: Re: [PATCH v2] drm/dp: Fix aux->transfer NULL pointer > >>>>>> dereference on drm_dp_dpcd_access > >>>>>> > >>>>>> [CAUTION: External Email] > >>>>>> > >>>>>> On Mon, 01 Nov 2021, Perry Yuan <Perry.Yuan@xxxxxxx> wrote: > >>>>>>> Fix below crash by adding a check in the drm_dp_dpcd_access > >>>>>>> which ensures that aux->transfer was actually initialized earlier. > >>>>>> > >>>>>> Gut feeling says this is papering over a real usage issue > >>>>>> somewhere else. Why is the aux being used for transfers before > >>>>>> ->transfer has been set? Why should the dp helper be defensive > >>>>>> against all kinds of > >>>> misprogramming? > >>>>>> > >>>>>> > >>>>>> BR, > >>>>>> Jani. > >>>>>> > >>>>> > >>>>> The issue was found by Intel IGT test suite, graphic by pass test case. > >>>>> > >> https://g > itl > >>>>> ab.freedesktop.org%2Fdrm%2Figt-gpu- > >>>> tools&data=04%7C01%7CPerry.Yuan > >>>>> %40amd.com%7C83d011acfe65437c0fa808d99ddc65b0%7C3dd8961fe4 > >> 884e6 > >>>> 08e11a8 > >>>>> > >>>> > >> 2d994e183d%7C0%7C0%7C637714392203200313%7CUnknown%7CTWFpbG > >> Zsb > >>>> 3d8eyJWIj > >>>>> > >>>> > >> oiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1 > >> 00 > >>>> 0&am > >>>>> > >>>> > >> p;sdata=snPpRYLGeJtTpNGle1YHZAvevcABbgLkgOsffiNzQPw%3D&reser > >> ved > >>>> =0 > >>>>> normally use case will not see the issue. > >>>>> To avoid this issue happy again when we run the test case , it > >>>>> will be nice to > >>>> add a check before the transfer is called. > >>>>> And we can see that it really needs to have a check here to make > >>>>> ITG &kernel > >>>> happy. > >>>> > >>>> You're missing my point. What is the root cause? Why do you have > >>>> the aux device or connector registered before ->transfer function > >>>> is initialized. I don't think you should do that. > >>>> > >>>> BR, > >>>> Jani. > >>>> > >>> > >>> One potential IGT fix patch to resolve the test case failure is: > >>> > >>> tests/amdgpu/amd_bypass.c > >>> data->pipe_crc = igt_pipe_crc_new(data->drm_fd, data->pipe_id, > >>> - AMDGPU_PIPE_CRC_SOURCE_DPRX); > >>> + > >>> INTEL_PIPE_CRC_SOURCE_AUTO); The kernel panic error gone after change > "dprx" to "auto" in the IGT test. > >>> > >>> In my view ,the IGT amdgpu bypass test will do some common setup > >>> work > >> including crc piple, source. > >>> When the IGT sets up a new CRC pipe capture source for amdgpu bypass > >> test, the SOURCE was set as "dprx" instead of "auto" > >>> It makes "amdgpu_dm_crtc_set_crc_source()" failed to set correct > >>> AUX > >> and it's transfer function invalid . > >>> The system I tested use HDMI port connected to monitor . > >>> > >>> amdgpu_dm_crtc_set_crc_source-> (aux = (aconn->port) ? &aconn- > >>> port->aux : &aconn->dm_dp_aux.aux;) > >>> drm_dp_start_crc -> > >>> drm_dp_dpcd_readb-> aux->transfer is NULL, issue here. > >>> The fix will use the "auto" keyword, which will let the driver > >>> select a > >> default source of frame CRCs for this CRTC. > >>> > >>> Correct me if have some wrong points. > >> > >> Apparently I'm completely failing to communicate my POV to you. > >> > >> If you have a kernel oops, the fix needs to be in the kernel, not IGT. > >> > >> The question is, why is it possible for IGT (or any userspace) to > >> trigger AUX communication when the ->transfer function is not set? In > >> my opinion, the kernel driver should not have exposed the interface > >> at all if the ->transfer function is not set. The interface is useless without the - > >transfer function. > >> IMO, that's the bug. > >> > > > > Yes , you are correct , the transfer shouldn't be called before it is ready ! > > > > Let me explain more details in my view . > > Maybe the root cause is not why the aux->transfer is not called before it is > registered in this case. > > I suppose the issue was triggered by wrong CRC pipe source . > > > > Actually the aux->transfer has been registered when amdgpu DM registered at > kernel boot. > > IGT test was run when system login to Gnome desktop. > > > > amdgpu_dm_initialize_dp_connector-> > > aconnector->dm_dp_aux.aux.transfer = dm_dp_aux_transfer; > > > > The test case failed when the IGT set an "DPRX" CRC pipe source while the > HDMI connected to monitor only. > > At this time, the aux->transfer is NULL, and dp helper did not check the > transfer pointer NULL or not. > > It calls the transfers to DPCD read, then you see the kernel panic log. > > > > amdgpu_dm_crtc_funcs-> amdgpu_dm_crtc_set_crc_source-> > > drm_dp_start_crc > > > > * And if the DP cable connected only, the issue will not happen. Test will pass. > > * If I change the CRC source to "auto", kernel will not see the panic as well. > > Maybe the failed test case need to run on the DP instead of HDMI, I am not > sure at now. > > > > Two things need to happen: > 1) IGT should skip tests requiring DPRX CRC source if not on a > DP connector. > 2) Driver should return EINVAL (or another appropriate error) if > DPRX CRC source is requested when the CRTC is not connected to > a DP display. Alternatively we could make sure that DPRX is > not advertised as a CRC source in this case but I'm not sure > how difficult that would be. > > Like Jani said, I don't think the current patch is the correct one as it doesn't get > to the root cause. The root cause fix should be in the CRC debugfs handling code. > > Harry Got your point. I will make another two patches as you suggested. Thanks for your feedback. Perry. > > > > > Hopping this info can help. > > > > Perry. > > > > > >> > >> BR, > >> Jani. > >> > >>> > >>> Thank you! > >>> Perry. > >>> > >>>> > >>>>> > >>>>> Perry. > >>>>> > >>>>>> > >>>>>>> > >>>>>>> BUG: kernel NULL pointer dereference, address: 0000000000000000 > >>>>>>> PGD > >>>>>>> 0 P4D 0 > >>>>>>> Oops: 0010 [#1] SMP NOPTI > >>>>>>> RIP: 0010:0x0 > >>>>>>> Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6. > >>>>>>> RSP: 0018:ffffa8d64225bab8 EFLAGS: 00010246 > >>>>>>> RAX: 0000000000000000 RBX: 0000000000000020 RCX: > >>>>>>> ffffa8d64225bb5e > >>>>>>> RDX: ffff93151d921880 RSI: ffffa8d64225bac8 RDI: > >>>>>>> ffff931511a1a9d8 > >>>>>>> RBP: ffffa8d64225bb10 R08: 0000000000000001 R09: > >>>>>>> ffffa8d64225ba60 > >>>>>>> R10: 0000000000000002 R11: 000000000000000d R12: > >>>>>>> 0000000000000001 > >>>>>>> R13: 0000000000000000 R14: ffffa8d64225bb5e R15: > >>>>>>> ffff931511a1a9d8 > >>>>>>> FS: 00007ff8ea7fa9c0(0000) GS:ffff9317fe6c0000(0000) > >>>>>>> knlGS:0000000000000000 > >>>>>>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > >>>>>>> CR2: ffffffffffffffd6 CR3: 000000010d5a4000 CR4: > >>>>>>> 0000000000750ee0 > >>>>>>> PKRU: 55555554 > >>>>>>> Call Trace: > >>>>>>> drm_dp_dpcd_access+0x72/0x110 [drm_kms_helper] > >>>>>>> drm_dp_dpcd_read+0xb7/0xf0 [drm_kms_helper] > >>>>>>> drm_dp_start_crc+0x38/0xb0 [drm_kms_helper] > >>>>>>> amdgpu_dm_crtc_set_crc_source+0x1ae/0x3e0 [amdgpu] > >>>>>>> crtc_crc_open+0x174/0x220 [drm] > >>>>>>> full_proxy_open+0x168/0x1f0 > >>>>>>> ? open_proxy_open+0x100/0x100 > >>>>>>> do_dentry_open+0x156/0x370 > >>>>>>> vfs_open+0x2d/0x30 > >>>>>>> > >>>>>>> v2: fix some typo > >>>>>>> > >>>>>>> Signed-off-by: Perry Yuan <Perry.Yuan@xxxxxxx> > >>>>>>> --- > >>>>>>> drivers/gpu/drm/drm_dp_helper.c | 4 ++++ > >>>>>>> 1 file changed, 4 insertions(+) > >>>>>>> > >>>>>>> diff --git a/drivers/gpu/drm/drm_dp_helper.c > >>>>>>> b/drivers/gpu/drm/drm_dp_helper.c index > >>>>>>> 6d0f2c447f3b..76b28396001a > >>>>>>> 100644 > >>>>>>> --- a/drivers/gpu/drm/drm_dp_helper.c > >>>>>>> +++ b/drivers/gpu/drm/drm_dp_helper.c > >>>>>>> @@ -260,6 +260,10 @@ static int drm_dp_dpcd_access(struct > >>>>>>> drm_dp_aux > >>>>>> *aux, u8 request, > >>>>>>> msg.buffer = buffer; > >>>>>>> msg.size = size; > >>>>>>> > >>>>>>> + /* No transfer function is set, so not an available DP connector */ > >>>>>>> + if (!aux->transfer) > >>>>>>> + return -EINVAL; > >>>>>>> + > >>>>>>> mutex_lock(&aux->hw_mutex); > >>>>>>> > >>>>>>> /* > >>>>>> > >>>>>> -- > >>>>>> Jani Nikula, Intel Open Source Graphics Center > >>>> > >>>> -- > >>>> Jani Nikula, Intel Open Source Graphics Center > >> > >> -- > >> Jani Nikula, Intel Open Source Graphics Center