Re: [PATCH] hwmon: (peci/dimmtemp) Do not provide fake thresholds data

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jan 27, 2025 at 10:39:44AM -0800, Guenter Roeck wrote:
> On 1/27/25 10:30, Paul Fertser wrote:
> > Hi Guenter,
> > 
> > On Mon, Jan 27, 2025 at 09:29:39AM -0800, Guenter Roeck wrote:
> > > On 1/27/25 08:40, Winiarska, Iwona wrote:
> > > > On Thu, 2025-01-23 at 15:20 +0300, Paul Fertser wrote:
> > > > > When an Icelake or Sapphire Rapids CPU isn't providing the maximum and
> > > > > critical thresholds for particular DIMM the driver should return an
> > > > > error to the userspace instead of giving it stale (best case) or wrong
> > > > > (the structure contains all zeros after kzalloc() call) data.
> > > > > 
> > > > > The issue can be reproduced by binding the peci driver while the host is
> > > > > fully booted and idle, this makes PECI interaction unreliable enough.
> > > > > 
> > > > > Fixes: 73bc1b885dae ("hwmon: peci: Add dimmtemp driver")
> > > > > Fixes: 621995b6d795 ("hwmon: (peci/dimmtemp) Add Sapphire Rapids support")
> > > > > Cc: stable@xxxxxxxxxxxxxxx
> > > > > Signed-off-by: Paul Fertser <fercerpav@xxxxxxxxx>
> > > > 
> > > > Hi!
> > > > 
> > > > Thank you for the patch.
> > > > Did you have a chance to test it with OpenBMC dbus-sensors?
> > > > In general, the change looks okay to me, but since it modifies the behavior
> > > > (applications will need to handle this, and returning an error will happen more
> > > > often) we need to confirm that it does not cause any regressions for userspace.
> > > > 
> > > 
> > > I would also like to understand if the error is temporary or permanent.
> > > If it is permanent, the attributes should not be created in the first
> > > place. It does not make sense to have limit attributes which always report
> > > -ENODATA.
> > 
> > The error is temporary. The underlying reason is that when host CPUs
> > go to deep enough idle sleep state (probably C6) they stop responding
> > to PECI requests from BMC. Once something starts running the CPU
> > leaves C6 and starts responding and all the temperature data
> > (including the thresholds) becomes available again.
> > 
> 
> Thanks.
> 
> Next question: Is there evidence that the thresholds change while the CPU
> is in a deep sleep state (or, in other words, that they are indeed stale) ?
> Because if not it would be (much) better to only report -ENODATA if the
> thresholds are uninitialized, and it would be even better than that if the
> limits are read during initialization (and not updated at all) if they do
> not change dynamically.


[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux