On Wed, Mar 16, 2022 at 3:14 PM Guenter Roeck <linux@xxxxxxxxxxxx> wrote: > > On 3/16/22 13:03, Brandon Wyman wrote: > > On Sun, Mar 13, 2022 at 11:36 PM Guenter Roeck <linux@xxxxxxxxxxxx> wrote: > >> > >> On 3/11/22 10:10, Brandon Wyman wrote: > >>> Add a clear_faults write-only debugfs entry for the ibm-cffps device > >>> driver. > >>> > >>> Certain IBM power supplies require clearing some latched faults in order > >>> to indicate that the fault has indeed been observed/noticed. > >>> > >> > >> That is insufficient, sorry. Please provide the affected power supplies as > >> well as the affected faults, and confirm that the problem still exists > >> in v5.17-rc6 or later kernels - or, more specifically, in any kernel which > >> includes commit 35f165f08950 ("hwmon: (pmbus) Clear pmbus fault/warning > >> bits after read"). > >> > >> Thanks, > >> Guenter > > > > Sorry for the delay in responding. I did some testing with commit > > 35f165f08950. I could not get that code to send the CLEAR_FAULTS > > command to the power supplies. > > > > I can update the commit message to be more specific about which power > > supplies need this CLEAR_FAULTS sent, and which faults. It is observed > > with the 1600W power supplies (2B1E model). The faults that latch are > > the VIN_UV and INPUT faults in the STATUS_WORD. The corresponding > > STATUS_INPUT fault bits are VIN_UV_FAULT and Unit is Off. > > > > The point is that the respective fault bits should be reset when the > corresponding alarm attributes are read. This isn't about executing > a CLEAR_FAULTS command, but about selectively resetting fault bits > while ensuring that faults are reported at least once. Executing > CLEAR_FAULTS is a big hammer. > > With the patch I pointed to in place, input (and other) faults should > be reset after the corresponding alarm attributes are read, assuming > that the condition no longer exists. If that does not happen, we should > fix the problem instead of deploying the big hammer. > > Thanks, > Guenter Okay, I see what you are pointing out there. I had been mostly looking at the "files" in the debugfs paths. Those do not end up running through that pmbus_get_boolean() function, so the individual fault clearing was not being attempted. The fault I was interested in appears to be associated with in1_lcrti_alarm. Reading that will give me a 1 if there is a VIN_UV fault, and then it sends 0x10 to STATUS_INPUT. That clears out VIN_UV, but the STATUS_INPUT command was returning 0x18. Nothing appears to handle clearing BIT(3), that 0x08 mask. Should there be some kind of define for BIT(3) over in pmbus.h? Something like PB_VOLTAGE_OFF? Somehow we need something using that in sbit of the attributes. I had a quick hack that just OR'ed BIT(3) with BIT(4) for that PB_VOLTAGE_UV_FAULT. That resulted in a clear of both bits in STATUS_INPUT, and the faults clearing in STATUS_WORD. It is not clear if there should be a separate alarm for that "Unit Off For Insufficient Input Voltage", or if the one for in1_lcrit_alarm could just be the two bits OR'ed into one mask. I can send a patch with a proposal on how to fix this one bit not getting cleared. > > >> > >>> Signed-off-by: Brandon Wyman <bjwyman@xxxxxxxxx> > >>> --- > >>> V1 -> V2: Explain why this change is needed > >>> > >>> drivers/hwmon/pmbus/ibm-cffps.c | 11 +++++++++++ > >>> 1 file changed, 11 insertions(+) > >>> > >>> diff --git a/drivers/hwmon/pmbus/ibm-cffps.c b/drivers/hwmon/pmbus/ibm-cffps.c > >>> index e3294a1a54bb..3f02dde02a4b 100644 > >>> --- a/drivers/hwmon/pmbus/ibm-cffps.c > >>> +++ b/drivers/hwmon/pmbus/ibm-cffps.c > >>> @@ -67,6 +67,7 @@ enum { > >>> CFFPS_DEBUGFS_CCIN, > >>> CFFPS_DEBUGFS_FW, > >>> CFFPS_DEBUGFS_ON_OFF_CONFIG, > >>> + CFFPS_DEBUGFS_CLEAR_FAULTS, > >>> CFFPS_DEBUGFS_NUM_ENTRIES > >>> }; > >>> > >>> @@ -274,6 +275,13 @@ static ssize_t ibm_cffps_debugfs_write(struct file *file, > >>> if (rc) > >>> return rc; > >>> > >>> + rc = 1; > >>> + break; > >>> + case CFFPS_DEBUGFS_CLEAR_FAULTS: > >>> + rc = i2c_smbus_write_byte(psu->client, PMBUS_CLEAR_FAULTS); > >>> + if (rc < 0) > >>> + return rc; > >>> + > >>> rc = 1; > >>> break; > >>> default: > >>> @@ -607,6 +615,9 @@ static int ibm_cffps_probe(struct i2c_client *client) > >>> debugfs_create_file("on_off_config", 0644, ibm_cffps_dir, > >>> &psu->debugfs_entries[CFFPS_DEBUGFS_ON_OFF_CONFIG], > >>> &ibm_cffps_fops); > >>> + debugfs_create_file("clear_faults", 0200, ibm_cffps_dir, > >>> + &psu->debugfs_entries[CFFPS_DEBUGFS_CLEAR_FAULTS], > >>> + &ibm_cffps_fops); > >>> > >>> return 0; > >>> } > >> >