On Wed, Feb 09, 2022 at 02:02:05AM +0000, Sripada, Radhakrishna wrote: > > > > -----Original Message----- > > From: Łukasz Bartosik <lb@xxxxxxxxxxxx> > > Sent: Tuesday, February 8, 2022 8:20 AM > > To: Jani Nikula <jani.nikula@xxxxxxxxxxxxxxx>; Joonas Lahtinen > > <joonas.lahtinen@xxxxxxxxxxxxxxx>; Vivi, Rodrigo <rodrigo.vivi@xxxxxxxxx>; > > Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxxxxxxx> > > Cc: Sripada, Radhakrishna <radhakrishna.sripada@xxxxxxxxx>; intel- > > gfx@xxxxxxxxxxxxxxxxxxxxx; upstream@xxxxxxxxxxxx; Ville Syrjälä > > <ville.syrjala@xxxxxxxxxxxxxxx>; Roper, Matthew D > > <matthew.d.roper@xxxxxxxxx>; Srivatsa, Anusha <anusha.srivatsa@xxxxxxxxx> > > Subject: Re: [PATCH v1] drm/i915: fix null pointer dereference > > > > Have you had a chance to review the patch ? The crash is still there > > on v5.17-rc3. > > > > Thanks, > > Lukasz > > > > wt., 1 lut 2022 o 16:49 Jani Nikula <jani.nikula@xxxxxxxxxxxxxxx> napisał(a): > > > > > > > > > Thanks for the patch, adding some Cc's from the commit that regressed. > > > > > > BR, > > > Jani. > > > > > > On Tue, 01 Feb 2022, Lukasz Bartosik <lb@xxxxxxxxxxxx> wrote: > > > > From: Łukasz Bartosik <lb@xxxxxxxxxxxx> > > > > > > > > Asus chromebook CX550 crashes during boot on v5.17-rc1 kernel. > > > > The root cause is null pointer defeference of bi_next > > > > in tgl_get_bw_info() in drivers/gpu/drm/i915/display/intel_bw.c. > > > > > > > > BUG: kernel NULL pointer dereference, address: 000000000000002e > > > > PGD 0 P4D 0 > > > > Oops: 0002 [#1] PREEMPT SMP NOPTI > > > > CPU: 0 PID: 1 Comm: swapper/0 Tainted: G U 5.17.0-rc1 > > > > Hardware name: Google Delbin/Delbin, BIOS Google_Delbin.13672.156.3 > > 05/14/2021 > > > > RIP: 0010:tgl_get_bw_info+0x2de/0x510 > > > > ... > > > > [ 2.554467] Call Trace: > > > > [ 2.554467] <TASK> > > > > [ 2.554467] intel_bw_init_hw+0x14a/0x434 > > > > [ 2.554467] ? _printk+0x59/0x73 > > > > [ 2.554467] ? _dev_err+0x77/0x91 > > > > [ 2.554467] i915_driver_hw_probe+0x329/0x33e > > > > [ 2.554467] i915_driver_probe+0x4c8/0x638 > > > > [ 2.554467] i915_pci_probe+0xf8/0x14e > > > > [ 2.554467] ? _raw_spin_unlock_irqrestore+0x12/0x2c > > > > [ 2.554467] pci_device_probe+0xaa/0x142 > > > > [ 2.554467] really_probe+0x13f/0x2f4 > > > > [ 2.554467] __driver_probe_device+0x9e/0xd3 > > > > [ 2.554467] driver_probe_device+0x24/0x7c > > > > [ 2.554467] __driver_attach+0xba/0xcf > > > > [ 2.554467] ? driver_attach+0x1f/0x1f > > > > [ 2.554467] bus_for_each_dev+0x8c/0xc0 > > > > [ 2.554467] bus_add_driver+0x11b/0x1f7 > > > > [ 2.554467] driver_register+0x60/0xea > > > > [ 2.554467] ? mipi_dsi_bus_init+0x16/0x16 > > > > [ 2.554467] i915_init+0x2c/0xb9 > > > > [ 2.554467] ? mipi_dsi_bus_init+0x16/0x16 > > > > [ 2.554467] do_one_initcall+0x12e/0x2b3 > > > > [ 2.554467] do_initcall_level+0xd6/0xf3 > > > > [ 2.554467] do_initcalls+0x4e/0x79 > > > > [ 2.554467] kernel_init_freeable+0xed/0x14d > > > > [ 2.554467] ? rest_init+0xc1/0xc1 > > > > [ 2.554467] kernel_init+0x1a/0x120 > > > > [ 2.554467] ret_from_fork+0x1f/0x30 > > > > [ 2.554467] </TASK> > > > > ... > > > > Kernel panic - not syncing: Fatal exception > > > > > > > > Fixes: c64a9a7c05be ("drm/i915: Update memory bandwidth formulae") > > LGTM, > Reviewed-by: Radhakrishna Sripada <radhakrishna.sripada@xxxxxxxxx> > > > > > Signed-off-by: Łukasz Bartosik <lb@xxxxxxxxxxxx> > > > > --- > > > > drivers/gpu/drm/i915/display/intel_bw.c | 16 +++++++++------- > > > > 1 file changed, 9 insertions(+), 7 deletions(-) > > > > > > > > diff --git a/drivers/gpu/drm/i915/display/intel_bw.c > > b/drivers/gpu/drm/i915/display/intel_bw.c > > > > index 2da4aacc956b..bd0ed68b7faa 100644 > > > > --- a/drivers/gpu/drm/i915/display/intel_bw.c > > > > +++ b/drivers/gpu/drm/i915/display/intel_bw.c > > > > @@ -404,15 +404,17 @@ static int tgl_get_bw_info(struct > > drm_i915_private *dev_priv, const struct intel > > > > int clpchgroup; > > > > int j; > > > > > > > > - if (i < num_groups - 1) > > > > - bi_next = &dev_priv->max_bw[i + 1]; > > > > - > > > > clpchgroup = (sa->deburst * qi.deinterleave / num_channels) << i; > > > > > > > > - if (i < num_groups - 1 && clpchgroup < clperchgroup) > > > > - bi_next->num_planes = (ipqdepth - clpchgroup) / clpchgroup + 1; > > > > - else > > > > - bi_next->num_planes = 0; > > > > + if (i < num_groups - 1) { > > > > + bi_next = &dev_priv->max_bw[i + 1]; > > > > + > > > > + if (clpchgroup < clperchgroup) > > > > + bi_next->num_planes = (ipqdepth - clpchgroup) / > > > > + clpchgroup + 1; > > > > + else > > > > + bi_next->num_planes = 0; > > > > + } BTW this code makes me rather suspicious overall. num_planes==0 means no planes can be enabled at all. Is that even correct? IIRC the icl code did not have qgv points that had num_planes==0. Also IIRC I added that 'num_planes = ... + 1' to the icl code ot make it actually sensible. The icl sample code didn't have that +1 and instead it used '>' as opposed to '>=' in the comparison to the actual number of enabled planes thus implying the +1. But now here in the tgl+ code we have the +1 for in one branch of the if, but the other branch just has a 0 (so no +1). And it doesn't help that the code is doing this weird [i] + [i+1] stuff inside the single loop. Would be a lot more legible if we just did two loops I think. Though I see the same awkward construct is used in spec sample code as well. -- Ville Syrjälä Intel