Looks like our new CI has given all the answers we need :) which is a
great win for the CI in my opinion.
Take a look at this report :
https://gitlab.freedesktop.org/drm/msm/-/jobs/22015361
This issue seems to be because this change
https://github.com/torvalds/linux/commit/169466d4e59ca204683998b7f45673ebf0eb2de6
is missing in our tree.
Without this change, what happens is that we are not hitting the return
0 because we check for ENODEV.
/*
* External bridges are mandatory for eDP interfaces: one has to
* provide at least an eDP panel (which gets wrapped into
panel-bridge).
*
* For DisplayPort interfaces external bridges are optional, so
* silently ignore an error if one is not present (-ENODEV).
*/
rc = dp_parser_find_next_bridge(dp_priv->parser);
if (!dp->is_edp && rc == -ENODEV)
return 0;
So, I think we should do both:
1) Since we are running CI on the tree, backport this change so that
this error path doesnt hit?
2) Add this protection as well because this shows that we can indeed hit
this path in EDEFER cases causing this crash.
Thanks
Abhinav
On 4/27/2022 3:53 AM, Dmitry Baryshkov wrote:
On 27/04/2022 00:50, Stephen Boyd wrote:
Quoting Vinod Polimera (2022-04-25 23:02:11)
Avoid clearing irqs and derefernce hw_intr when hw_intr is null.
Presumably this is only the case when the display driver doesn't fully
probe and something probe defers? Can you clarify how this situation
happens?
BUG: Unable to handle kernel NULL pointer dereference at virtual
address 0000000000000000
Call trace:
dpu_core_irq_uninstall+0x50/0xb0
dpu_irq_uninstall+0x18/0x24
msm_drm_uninit+0xd8/0x16c
msm_drm_bind+0x580/0x5fc
try_to_bring_up_master+0x168/0x1c0
__component_add+0xb4/0x178
component_add+0x1c/0x28
dp_display_probe+0x38c/0x400
platform_probe+0xb0/0xd0
really_probe+0xcc/0x2c8
__driver_probe_device+0xbc/0xe8
driver_probe_device+0x48/0xf0
__device_attach_driver+0xa0/0xc8
bus_for_each_drv+0x8c/0xd8
__device_attach+0xc4/0x150
device_initial_probe+0x1c/0x28
Fixes: a73033619ea ("drm/msm/dpu: squash dpu_core_irq into
dpu_hw_interrupts")
The fixes tag looks odd. In dpu_core_irq_uninstall() at that commit it
is dealing with 'irq_obj' which isn't a pointer. After commit
f25f656608e3 ("drm/msm/dpu: merge struct dpu_irq into struct
dpu_hw_intr") dpu_core_irq_uninstall() starts using 'hw_intr' which is
allocated on the heap. If we backported this patch to a place that had
a73033619ea without f25f656608e3 it wouldn't make any sense.
I'd agree here. The following tag would be correct:
Fixes: f25f656608e3 ("drm/msm/dpu: merge struct dpu_irq into struct
dpu_hw_intr")
Signed-off-by: Vinod Polimera <quic_vpolimer@xxxxxxxxxxx>
---
drivers/gpu/drm/msm/disp/dpu1/dpu_hw_interrupts.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_interrupts.c
b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_interrupts.c
index c515b7c..ab28577 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_interrupts.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_interrupts.c
@@ -599,6 +599,9 @@ void dpu_core_irq_uninstall(struct dpu_kms *dpu_kms)
{
int i;
+ if (!dpu_kms->hw_intr)
+ return;
+
pm_runtime_get_sync(&dpu_kms->pdev->dev);
for (i = 0; i < dpu_kms->hw_intr->total_irqs; i++)