On Wed, Mar 29, 2023 at 10:50:09PM -0700, Dixit, Ashutosh wrote: > On Tue, 28 Mar 2023 16:35:43 -0700, Ashutosh Dixit wrote: > > > > On ATSM the PL1 limit is disabled at power up. The previous uapi assumed > > that the PL1 limit is always enabled and therefore did not have a notion of > > a disabled PL1 limit. This results in erroneous PL1 limit values when the > > PL1 limit is disabled. For example at power up, the disabled ATSM PL1 limit > > was previously shown as 0 which means a low PL1 limit whereas the limit > > being disabled actually implies a high effective PL1 limit value. > > > > To get round this problem, the PL1 limit uapi is expanded to include a > > special value 0 to designate a disabled PL1 limit. > > This patch is another attempt to show when the PL1 power limit is disabled > and to disable it when it needs to. Previous abandoned attempts to do this > are [1] and [2]. > > The preferred way to do this was [2] but that was NAK'd by hwmon folks (see > [2]). That is why here we fall back on the approach in [1]. I still don't get it, but let's move on... > > This patch is identical to [1] except that the value used to disable the > PL1 limit has been changed to 0 (from -1 in [1]) as was suggested in [2] > (both -1 and 0 seem ok for the purpose). > > > Bug: https://gitlab.freedesktop.org/drm/intel/-/issues/8062 > > Bug: https://gitlab.freedesktop.org/drm/intel/-/issues/8060 > > The link between this patch and these pretty serious bugs might not be > immediately clear so here's an explanation: > > * Because on ATSM the PL1 power limit is disabled on power up and there > were no means to enable it, in 6fd3d8bf89fc we implemented the means to > enable the limit when the PL1 hwmon entry (power1_max) was written to. > > * Now there is an IGT igt@i915_hwmon@hwmon_write which (a) reads orig value > from all hwmon sysfs (b) does a bunch of random writes and finally (c) > restores the orig value read. On ATSM since the orig value was 0, when > the IGT restores the 0 value, the PL1 limit is now enabled with a value > of 0. > > * PL1 limit of 0 implies a low PL1 limit which causes GPU freq to fall to > 100 MHz. This causes GuC FW load and several IGT's to start timing out > and gives rise the above (and even more) bugs about GuC FW load timing > out. I believe these 3 bullets are key information that deserves to be in the commit message itself. With that there, Reviewed-by: Rodrigo Vivi <rodrigo.vivi@xxxxxxxxx> > > * After this patch, writing 0 would disable the PL1 limit instead of > enabling it, avoiding the freq drop issue above, and resolving this Intel > CI issue. > > Thanks. > -- > Ashutosh > > [1] https://patchwork.freedesktop.org/patch/522612/?series=113972&rev=1 > [2] https://patchwork.freedesktop.org/patch/522652/?series=113984&rev=1