Re: [PATCH] drm/i915/hwmon: Fix locking inversion in sysfs getter

Rodrigo Vivi <rodrigo.vivi@xxxxxxxxx> · Mon, 11 Mar 2024 14:41:23 -0400

On Mon, Mar 11, 2024 at 07:14:09PM +0100, Janusz Krzysztofik wrote:
> On Monday, 11 March 2024 18:35:43 CET Guenter Roeck wrote:
> > On 3/11/24 09:58, Rodrigo Vivi wrote:
> > > On Mon, Mar 11, 2024 at 09:06:46AM +0100, Janusz Krzysztofik wrote:
> > >> In i915 hwmon sysfs getter path we now take a hwmon_lock, then acquire an
> > >> rpm wakeref.  That results in lock inversion:
> > >>
> > >> <4> [197.079335] ======================================================
> > >> <4> [197.085473] WARNING: possible circular locking dependency detected
> > >> <4> [197.091611] 6.8.0-rc7-Patchwork_129026v7-gc4dc92fb1152+ #1 Not tainted
> > >> <4> [197.098096] ------------------------------------------------------
> > >> <4> [197.104231] prometheus-node/839 is trying to acquire lock:
> > >> <4> [197.109680] ffffffff82764d80 (fs_reclaim){+.+.}-{0:0}, at: __kmalloc+0x9a/0x350
> > >> <4> [197.116939]
> > >> but task is already holding lock:
> > >> <4> [197.122730] ffff88811b772a40 (&hwmon->hwmon_lock){+.+.}-{3:3}, at: hwm_energy+0x4b/0x100 [i915]
> > >> <4> [197.131543]
> > >> which lock already depends on the new lock.
> > >> ...
> > >> <4> [197.507922] Chain exists of:
> > >>    fs_reclaim --> &gt->reset.mutex --> &hwmon->hwmon_lock
> > >> <4> [197.518528]  Possible unsafe locking scenario:
> > >> <4> [197.524411]        CPU0                    CPU1
> > >> <4> [197.528916]        ----                    ----
> > >> <4> [197.533418]   lock(&hwmon->hwmon_lock);
> > >> <4> [197.537237]                                lock(&gt->reset.mutex);
> > >> <4> [197.543376]                                lock(&hwmon->hwmon_lock);
> > >> <4> [197.549682]   lock(fs_reclaim);
> > >> ...
> > >> <4> [197.632548] Call Trace:
> > >> <4> [197.634990]  <TASK>
> > >> <4> [197.637088]  dump_stack_lvl+0x64/0xb0
> > >> <4> [197.640738]  check_noncircular+0x15e/0x180
> > >> <4> [197.652968]  check_prev_add+0xe9/0xce0
> > >> <4> [197.656705]  __lock_acquire+0x179f/0x2300
> > >> <4> [197.660694]  lock_acquire+0xd8/0x2d0
> > >> <4> [197.673009]  fs_reclaim_acquire+0xa1/0xd0
> > >> <4> [197.680478]  __kmalloc+0x9a/0x350
> > >> <4> [197.689063]  acpi_ns_internalize_name.part.0+0x4a/0xb0
> > >> <4> [197.694170]  acpi_ns_get_node_unlocked+0x60/0xf0
> > >> <4> [197.720608]  acpi_ns_get_node+0x3b/0x60
> > >> <4> [197.724428]  acpi_get_handle+0x57/0xb0
> > >> <4> [197.728164]  acpi_has_method+0x20/0x50
> > >> <4> [197.731896]  acpi_pci_set_power_state+0x43/0x120
> > >> <4> [197.736485]  pci_power_up+0x24/0x1c0
> > >> <4> [197.740047]  pci_pm_default_resume_early+0x9/0x30
> > >> <4> [197.744725]  pci_pm_runtime_resume+0x2d/0x90
> > >> <4> [197.753911]  __rpm_callback+0x3c/0x110
> > >> <4> [197.762586]  rpm_callback+0x58/0x70
> > >> <4> [197.766064]  rpm_resume+0x51e/0x730
> > >> <4> [197.769542]  rpm_resume+0x267/0x730
> > >> <4> [197.773020]  rpm_resume+0x267/0x730
> > >> <4> [197.776498]  rpm_resume+0x267/0x730
> > >> <4> [197.779974]  __pm_runtime_resume+0x49/0x90
> > >> <4> [197.784055]  __intel_runtime_pm_get+0x19/0xa0 [i915]
> > >> <4> [197.789070]  hwm_energy+0x55/0x100 [i915]
> > >> <4> [197.793183]  hwm_read+0x9a/0x310 [i915]
> > >> <4> [197.797124]  hwmon_attr_show+0x36/0x120
> > >> <4> [197.800946]  dev_attr_show+0x15/0x60
> > >> <4> [197.804509]  sysfs_kf_seq_show+0xb5/0x100
> > >>
> > >> However, the lock is only intended to protect either a hwmon overflow
> > >> counter or rmw hardware operations.  There is no need to hold the lock,
> > >> only the wakeref, while reading from hardware.
> > >>
> > >> Acquire the lock after hardware read under rpm wakeref.
> > >>
> > >> Fixes: c41b8bdcc297 ("drm/i915/hwmon: Show device level energy usage")
> > >> Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@xxxxxxxxxxxxxxx>
> > >> Cc: <stable@xxxxxxxxxxxxxxx> # v6.2+
> > >> ---
> > >>   drivers/gpu/drm/i915/i915_hwmon.c | 4 ++--
> > >>   1 file changed, 2 insertions(+), 2 deletions(-)
> > >>
> > >> diff --git a/drivers/gpu/drm/i915/i915_hwmon.c b/drivers/gpu/drm/i915/i915_hwmon.c
> > >> index 8c3f443c8347e..faf7670de6e06 100644
> > >> --- a/drivers/gpu/drm/i915/i915_hwmon.c
> > >> +++ b/drivers/gpu/drm/i915/i915_hwmon.c
> > >> @@ -136,11 +136,11 @@ hwm_energy(struct hwm_drvdata *ddat, long *energy)
> > >>   	else
> > >>   		rgaddr = hwmon->rg.energy_status_all;
> > >>   
> > >> -	mutex_lock(&hwmon->hwmon_lock);
> > >> -
> > >>   	with_intel_runtime_pm(uncore->rpm, wakeref)
> > >>   		reg_val = intel_uncore_read(uncore, rgaddr);
> > >>   
> > >> +	mutex_lock(&hwmon->hwmon_lock);
> > >> +
> > > 
> > > This is not enough.
> > > check hwm_locked_with_pm_intel_uncore_rmw()
> > > 
> > > It looks like we need to rethink this lock entirely here.
> > > 
> > 
> > I would have assumed that the lock was supposed to ensure that
> > reading the register value and the subsequent update of accum_energy
> > and reg_val_prev was synchronized. This is no longer the case
> > after this patch has been applied. Given that, I would agree that
> > it would make sense to define what the lock is supposed to protect
> > before changing its scope.
> 
> Right.  In that case, I propose to take the wakeref before the lock, and keep 
> it while the lock is held around the calculations.  Would that be acceptable 
> as a quick fix?  If yes then I can take the same approach to also fix other 
> places in i915_hwmon.c for now where similar lock inversion can happen, as 
> Rodrigo pointed out.  Without that, we are stuck with another series that 
> cleans up excessive use of rpm wakerefs by other users, since those wakerefs 
> evidently help with the issue in hwmon by hiding it, even if not related, and 
> dropping them will expose it.

That would work. It is what we have on drivers/gpu/drm/xe/xe_hwmon.c already.

Please convert every case and ensure that we are using pm_runtime_get on every
path. Likely same places already in xe_hwmon.c

> 
> Thanks,
> Janusz
> 
> > 
> > Guenter
> > 
> > 
> 
> 
> 
>