On Mon, 2 May 2022 at 04:38, Abhinav Kumar <quic_abhinavk@xxxxxxxxxxx> wrote: > > Looks like our new CI has given all the answers we need :) which is a > great win for the CI in my opinion. > > Take a look at this report : > https://gitlab.freedesktop.org/drm/msm/-/jobs/22015361 > > This issue seems to be because this change > https://github.com/torvalds/linux/commit/169466d4e59ca204683998b7f45673ebf0eb2de6 > is missing in our tree. > > Without this change, what happens is that we are not hitting the return > 0 because we check for ENODEV. > > > /* > * External bridges are mandatory for eDP interfaces: one has to > * provide at least an eDP panel (which gets wrapped into > panel-bridge). > * > * For DisplayPort interfaces external bridges are optional, so > * silently ignore an error if one is not present (-ENODEV). > */ > rc = dp_parser_find_next_bridge(dp_priv->parser); > if (!dp->is_edp && rc == -ENODEV) > return 0; > > So, I think we should do both: > > 1) Since we are running CI on the tree, backport this change so that > this error path doesnt hit? > > 2) Add this protection as well because this shows that we can indeed hit > this path in EDEFER cases causing this crash. I have been waiting for v2 for the last week or so. It should include a fixed Fixes tag and an updated description (which should note that this happens in the error path, etc) as requested by Stephen. > > Thanks > > Abhinav > > On 4/27/2022 3:53 AM, Dmitry Baryshkov wrote: > > On 27/04/2022 00:50, Stephen Boyd wrote: > >> Quoting Vinod Polimera (2022-04-25 23:02:11) > >>> Avoid clearing irqs and derefernce hw_intr when hw_intr is null. > >> > >> Presumably this is only the case when the display driver doesn't fully > >> probe and something probe defers? Can you clarify how this situation > >> happens? > >> > >>> > >>> BUG: Unable to handle kernel NULL pointer dereference at virtual > >>> address 0000000000000000 > >>> > >>> Call trace: > >>> dpu_core_irq_uninstall+0x50/0xb0 > >>> dpu_irq_uninstall+0x18/0x24 > >>> msm_drm_uninit+0xd8/0x16c > >>> msm_drm_bind+0x580/0x5fc > >>> try_to_bring_up_master+0x168/0x1c0 > >>> __component_add+0xb4/0x178 > >>> component_add+0x1c/0x28 > >>> dp_display_probe+0x38c/0x400 > >>> platform_probe+0xb0/0xd0 > >>> really_probe+0xcc/0x2c8 > >>> __driver_probe_device+0xbc/0xe8 > >>> driver_probe_device+0x48/0xf0 > >>> __device_attach_driver+0xa0/0xc8 > >>> bus_for_each_drv+0x8c/0xd8 > >>> __device_attach+0xc4/0x150 > >>> device_initial_probe+0x1c/0x28 > >>> > >>> Fixes: a73033619ea ("drm/msm/dpu: squash dpu_core_irq into > >>> dpu_hw_interrupts") > >> > >> The fixes tag looks odd. In dpu_core_irq_uninstall() at that commit it > >> is dealing with 'irq_obj' which isn't a pointer. After commit > >> f25f656608e3 ("drm/msm/dpu: merge struct dpu_irq into struct > >> dpu_hw_intr") dpu_core_irq_uninstall() starts using 'hw_intr' which is > >> allocated on the heap. If we backported this patch to a place that had > >> a73033619ea without f25f656608e3 it wouldn't make any sense. > > > > I'd agree here. The following tag would be correct: > > > > Fixes: f25f656608e3 ("drm/msm/dpu: merge struct dpu_irq into struct > > dpu_hw_intr") > > > > > >> > >>> Signed-off-by: Vinod Polimera <quic_vpolimer@xxxxxxxxxxx> > >>> --- > >>> drivers/gpu/drm/msm/disp/dpu1/dpu_hw_interrupts.c | 3 +++ > >>> 1 file changed, 3 insertions(+) > >>> > >>> diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_interrupts.c > >>> b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_interrupts.c > >>> index c515b7c..ab28577 100644 > >>> --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_interrupts.c > >>> +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_interrupts.c > >>> @@ -599,6 +599,9 @@ void dpu_core_irq_uninstall(struct dpu_kms *dpu_kms) > >>> { > >>> int i; > >>> > >>> + if (!dpu_kms->hw_intr) > >>> + return; > >>> + > >>> pm_runtime_get_sync(&dpu_kms->pdev->dev); > >>> for (i = 0; i < dpu_kms->hw_intr->total_irqs; i++) > > > > -- With best wishes Dmitry