On Tue, 26 Mar 2024 05:48:38 -0700, Badal Nilawar wrote: > Hi Badal, > i915_hwmon and its resources are managed resources of i915 dev. > During i915 driver unregister flow the function i915_hwmon_unregister() > explicitly makes i915_hwmon resource NULL. This happen before > hwmon is actually unregistered. Doing so may cause UAF if hwmon > sysfs is being accessed: I don't agree with this patch. For even if we remove setting to NULL, it doesn't explain what we see, which is that the devm_kzalloc'd i915->hwmon has been *freed* (since ddat (i915->hwmon->ddat) is 0x6b6b6b6b6b6b6dbf): (gdb) list *hwm_energy+0x2b 0x161f8b is in hwm_energy (drivers/gpu/drm/i915/i915_hwmon.c:134). 129 struct hwm_energy_info *ei = &ddat->ei; 130 intel_wakeref_t wakeref; 131 i915_reg_t rgaddr; 132 u32 reg_val; 133 134 if (ddat->gt_n >= 0) 135 rgaddr = hwmon->rg.energy_status_tile; 136 else 137 rgaddr = hwmon->rg.energy_status_all; 138 If we don't have i915_hwmon_unregister equivalent in xe, that's because xe has not implemented equivalents of i915_hwmon_power_max_disable and i915_hwmon_power_max_restore, where the check is used. Also, we should verify any fix before sending a patch out, either via local repro (say use a script to read hwmon energy file while running the selftests) or CI. Though in this case CI doesn't show any failures but I am wondering if that is only by chance. Fwiu, the issue is somehow being caused by ddat being *freed* before hwmon sysfs has disappeared, at module unload time (since selftests use module unload), allowing prometheus-node process to call into the energy read sysfs and cause the crash. Thanks. -- Ashutosh > > <7> [518.386591] i915 0000:03:00.0: [drm] intel_gt_set_wedged called from intel_gt_set_wedged_on_fini+0xd/0x30 [i915] > <7> [518.471128] i915 0000:03:00.0: [drm:drm_client_release] drm_fb_helper > <4> [518.501476] general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6dbf: 0000 [#1] PREEMPT SMP NOPTI > <4> [518.512264] CPU: 6 PID: 679 Comm: prometheus-node Tainted: G U 6.9.0-rc1-CI_DRM_14482-g4a8fabcf2f1a+ #1 > <4> [518.522951] Hardware name: Intel Corporation Raptor Lake Client Platform/RPL-S ADP-S DDR5 UDIMM CRB, BIOS RPLSFWI1.R00.4221.A00.2305271351 05/27/2023 > <4> [518.536217] RIP: 0010:hwm_energy+0x2b/0x100 [i915] > <4> [518.541159] Code: 48 89 e5 41 57 41 56 41 55 41 54 53 48 89 fb 48 83 e4 f0 48 83 ec 10 4c 8b 77 08 4c 8b 2f 8b 7f 34 48 89 74 24 08 85 ff 78 2b <45> 8b bd 54 02 00 00 49 8b 7e 18 e8 35 e4 ea ff 49 89 c4 48 85 c0 > <4> [518.559746] RSP: 0018:ffffc900077efd00 EFLAGS: 00010202 > <4> [518.564943] RAX: 0000000000000000 RBX: ffff8881e3078428 RCX: 0000000000000000 > <4> [518.572025] RDX: 0000000000000001 RSI: ffffc900077efda0 RDI: 000000006b6b6b6b > <4> [518.579107] RBP: ffffc900077efd40 R08: ffffc900077efda0 R09: 0000000000000001 > <4> [518.586186] R10: 0000000000000001 R11: ffff88810c19aa00 R12: ffff888243e1a010 > <4> [518.593264] R13: 6b6b6b6b6b6b6b6b R14: 6b6b6b6b6b6b6b6b R15: ffff8881e3078428 > <4> [518.600343] FS: 00007f9def400700(0000) GS:ffff88888d100000(0000) knlGS:0000000000000000 > <4> [518.608373] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > <4> [518.614077] CR2: 0000564f19bff288 CR3: 0000000119f94000 CR4: 0000000000f50ef0 > <4> [518.621158] PKRU: 55555554 > <4> [518.623858] Call Trace: > <4> [518.626303] <TASK> > <4> [518.628400] ? __die_body+0x1a/0x60 > <4> [518.631881] ? die_addr+0x38/0x60 > <4> [518.635182] ? exc_general_protection+0x1a1/0x400 > <4> [518.639862] ? asm_exc_general_protection+0x26/0x30 > <4> [518.644710] ? hwm_energy+0x2b/0x100 [i915] > <4> [518.649007] hwm_read+0x9a/0x310 [i915] > <4> [518.652953] hwmon_attr_show+0x36/0x140 > <4> [518.656775] dev_attr_show+0x15/0x60 > > Fixes: b3b088e28183 ("drm/i915/hwmon: Add HWMON infrastructure") > Closes: https://gitlab.freedesktop.org/drm/intel/issues/10366 > Cc: Ashutosh Dixit <ashutosh.dixit@xxxxxxxxx> > Cc: Janusz Krzysztofik <janusz.krzysztofik@xxxxxxxxxxxxxxx> > Signed-off-by: Badal Nilawar <badal.nilawar@xxxxxxxxx> > --- > drivers/gpu/drm/i915/i915_driver.c | 2 -- > drivers/gpu/drm/i915/i915_hwmon.c | 5 ----- > 2 files changed, 7 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_driver.c b/drivers/gpu/drm/i915/i915_driver.c > index 4b9233c07a22..a95b455964b7 100644 > --- a/drivers/gpu/drm/i915/i915_driver.c > +++ b/drivers/gpu/drm/i915/i915_driver.c > @@ -660,8 +660,6 @@ static void i915_driver_unregister(struct drm_i915_private *dev_priv) > for_each_gt(gt, dev_priv, i) > intel_gt_driver_unregister(gt); > > - i915_hwmon_unregister(dev_priv); > - > i915_perf_unregister(dev_priv); > i915_pmu_unregister(dev_priv); > > diff --git a/drivers/gpu/drm/i915/i915_hwmon.c b/drivers/gpu/drm/i915/i915_hwmon.c > index c9169e56b9a1..91f171752d34 100644 > --- a/drivers/gpu/drm/i915/i915_hwmon.c > +++ b/drivers/gpu/drm/i915/i915_hwmon.c > @@ -841,8 +841,3 @@ void i915_hwmon_register(struct drm_i915_private *i915) > ddat_gt->hwmon_dev = hwmon_dev; > } > } > - > -void i915_hwmon_unregister(struct drm_i915_private *i915) > -{ > - fetch_and_zero(&i915->hwmon); > -} > -- > 2.25.1 >