On Thu, Mar 17, 2022 at 1:50 PM Guenter Roeck <linux@xxxxxxxxxxxx> wrote: > > On 3/17/22 09:12, Brandon Wyman wrote: > > On Wed, Mar 16, 2022 at 3:14 PM Guenter Roeck <linux@xxxxxxxxxxxx> wrote: > >> > >> On 3/16/22 13:03, Brandon Wyman wrote: > >>> On Sun, Mar 13, 2022 at 11:36 PM Guenter Roeck <linux@xxxxxxxxxxxx> wrote: > >>>> > >>>> On 3/11/22 10:10, Brandon Wyman wrote: > >>>>> Add a clear_faults write-only debugfs entry for the ibm-cffps device > >>>>> driver. > >>>>> > >>>>> Certain IBM power supplies require clearing some latched faults in order > >>>>> to indicate that the fault has indeed been observed/noticed. > >>>>> > >>>> > >>>> That is insufficient, sorry. Please provide the affected power supplies as > >>>> well as the affected faults, and confirm that the problem still exists > >>>> in v5.17-rc6 or later kernels - or, more specifically, in any kernel which > >>>> includes commit 35f165f08950 ("hwmon: (pmbus) Clear pmbus fault/warning > >>>> bits after read"). > >>>> > >>>> Thanks, > >>>> Guenter > >>> > >>> Sorry for the delay in responding. I did some testing with commit > >>> 35f165f08950. I could not get that code to send the CLEAR_FAULTS > >>> command to the power supplies. > >>> > >>> I can update the commit message to be more specific about which power > >>> supplies need this CLEAR_FAULTS sent, and which faults. It is observed > >>> with the 1600W power supplies (2B1E model). The faults that latch are > >>> the VIN_UV and INPUT faults in the STATUS_WORD. The corresponding > >>> STATUS_INPUT fault bits are VIN_UV_FAULT and Unit is Off. > >>> > >> > >> The point is that the respective fault bits should be reset when the > >> corresponding alarm attributes are read. This isn't about executing > >> a CLEAR_FAULTS command, but about selectively resetting fault bits > >> while ensuring that faults are reported at least once. Executing > >> CLEAR_FAULTS is a big hammer. > >> > >> With the patch I pointed to in place, input (and other) faults should > >> be reset after the corresponding alarm attributes are read, assuming > >> that the condition no longer exists. If that does not happen, we should > >> fix the problem instead of deploying the big hammer. > >> > >> Thanks, > >> Guenter > > > > Okay, I see what you are pointing out there. I had been mostly looking > > at the "files" in the debugfs paths. Those do not end up running > > through that pmbus_get_boolean() function, so the individual fault > > clearing was not being attempted. The fault I was interested in > > appears to be associated with in1_lcrti_alarm. Reading that will give > > me a 1 if there is a VIN_UV fault, and then it sends 0x10 to > > STATUS_INPUT. That clears out VIN_UV, but the STATUS_INPUT command was > > returning 0x18. Nothing appears to handle clearing BIT(3), that 0x08 > > mask. > > > > Should there be some kind of define for BIT(3) over in pmbus.h? > > Something like PB_VOLTAGE_OFF? Somehow we need something using that in > > sbit of the attributes. I had a quick hack that just OR'ed BIT(3) with > > BIT(4) for that PB_VOLTAGE_UV_FAULT. That resulted in a clear of both > > bits in STATUS_INPUT, and the faults clearing in STATUS_WORD. > > > > It is not clear if there should be a separate alarm for that "Unit Off > > For Insufficient Input Voltage", or if the one for in1_lcrit_alarm > > could just be the two bits OR'ed into one mask. I can send a patch > > with a proposal on how to fix this one bit not getting cleared. > > > > We don't have a separate standard attribute. I think the best approach > would be to add a mask for bit 3 and or that mask for lcrit in > vin_limit_attrs with PB_VOLTAGE_UV_FAULT. I'd suggest to name the > define something like PB_VOLTAGE_VIN_OFF or PB_VOLTAGE_VIN_FAULT > to clarify that the bit applies to the input. Done. See: https://lore.kernel.org/linux-hwmon/20220317232123.2103592-1-bjwyman@xxxxxxxxx/T/#u > > Thanks, > Guenter