On Thu, 30 Mar 2023 08:44:34 -0700, Rodrigo Vivi wrote: > > On Wed, Mar 29, 2023 at 10:50:09PM -0700, Dixit, Ashutosh wrote: > > On Tue, 28 Mar 2023 16:35:43 -0700, Ashutosh Dixit wrote: > > > > > > On ATSM the PL1 limit is disabled at power up. The previous uapi assumed > > > that the PL1 limit is always enabled and therefore did not have a notion of > > > a disabled PL1 limit. This results in erroneous PL1 limit values when the > > > PL1 limit is disabled. For example at power up, the disabled ATSM PL1 limit > > > was previously shown as 0 which means a low PL1 limit whereas the limit > > > being disabled actually implies a high effective PL1 limit value. > > > > > > To get round this problem, the PL1 limit uapi is expanded to include a > > > special value 0 to designate a disabled PL1 limit. > > > > This patch is another attempt to show when the PL1 power limit is disabled > > and to disable it when it needs to. Previous abandoned attempts to do this > > are [1] and [2]. > > > > The preferred way to do this was [2] but that was NAK'd by hwmon folks (see > > [2]). That is why here we fall back on the approach in [1]. > > I still don't get it, but let's move on... > > > > > This patch is identical to [1] except that the value used to disable the > > PL1 limit has been changed to 0 (from -1 in [1]) as was suggested in [2] > > (both -1 and 0 seem ok for the purpose). > > > > > Bug: https://gitlab.freedesktop.org/drm/intel/-/issues/8062 > > > Bug: https://gitlab.freedesktop.org/drm/intel/-/issues/8060 > > > > The link between this patch and these pretty serious bugs might not be > > immediately clear so here's an explanation: > > > > * Because on ATSM the PL1 power limit is disabled on power up and there > > were no means to enable it, in 6fd3d8bf89fc we implemented the means to > > enable the limit when the PL1 hwmon entry (power1_max) was written to. > > > > * Now there is an IGT igt@i915_hwmon@hwmon_write which (a) reads orig value > > from all hwmon sysfs (b) does a bunch of random writes and finally (c) > > restores the orig value read. On ATSM since the orig value was 0, when > > the IGT restores the 0 value, the PL1 limit is now enabled with a value > > of 0. > > > > * PL1 limit of 0 implies a low PL1 limit which causes GPU freq to fall to > > 100 MHz. This causes GuC FW load and several IGT's to start timing out > > and gives rise the above (and even more) bugs about GuC FW load timing > > out. > > I believe these 3 bullets are key information that deserves to be in > the commit message itself. Done in v2. > > With that there, > > Reviewed-by: Rodrigo Vivi <rodrigo.vivi@xxxxxxxxx> Thanks. -- Ashutosh > > > > > > * After this patch, writing 0 would disable the PL1 limit instead of > > enabling it, avoiding the freq drop issue above, and resolving this Intel > > CI issue. > > > > Thanks. > > -- > > Ashutosh > > > > [1] https://patchwork.freedesktop.org/patch/522612/?series=113972&rev=1 > > [2] https://patchwork.freedesktop.org/patch/522652/?series=113984&rev=1