On 8/03/21 3:27 pm, Chris Packham wrote: > > On 8/03/21 1:31 pm, Guenter Roeck wrote: >> On 3/7/21 2:52 PM, Chris Packham wrote: >>> Hi, >>> >>> I've got a system using a PowerPC T2080 SoC and among other things has >>> an LM81 hwmon chip. >>> >>> Under a high CPU load we see errant readings from the LM81 as well as >>> actual failures. It's the errant readings that cause the most concern >>> since we can easily ignore the read errors in our monitoring >>> application >>> (although it would be better if they weren't there at all). >>> >>> I'm able to reproduce this with a test application[0] that artificially >>> creates a high CPU load then by repeatedly checking for the all-1s >>> values from the LM81 datasheet[1](page 17). The all-1s readings stick >>> out as they are obviously higher than the voltage rails that are >>> connected and disagree with measurements taken with a multimeter. >>> >>> Here's the output from my device >>> >>> [root@linuxbox ~]# cpuload 90& >>> [root@linuxbox ~]# (while true; do cat >>> /sys/class/hwmon/hwmon0/in*_input >>> | grep '3320\|4383\|6641\|15930\|3586'; sleep 1; done)& >>> 3586 >>> 3586 >>> cat: read error: No such device or address >>> cat: read error: No such device or address >>> 3320 >>> 3320 >>> 3586 >>> 3586 >>> 6641 >>> 6641 >>> 4383 >>> 4383 >>> >>> Fundamentally I think this is a problem with the fact that the LM81 is >>> an SMBus device but the T2080 (and other Freescale SoCs) uses i2c >>> and we >>> emulate SMBus. I suspect the errant readings are when we don't get >>> round >>> to completing the read within the timeout specified by the SMBus >>> specification. Depending on when that happens we either fail the >>> transfer or interpret the result as all-1s. >>> >> That is quite unlikely. Many sensor chips are SMBus chips connected to >> i2c busses. It is much more likely that there is a bug in the T2080 >> i2c driver, >> that the chip doesn't like the bulk read command issued through >> regmap, that >> the chip has problems with the i2c bus speed, or that the i2c bus is >> noisy. > Perhaps something gets upset when interrupt processing is delayed > because of CPU load. I don't see the problem when there isn't a CPU > load so I think that eliminates board issues. >> In this context, the "No such device or address" responses are very >> suspicious. >> Those are reported by the i2c driver, not by the hwmon driver, and >> suggest >> that the chip did not respond to a read request. Maybe it helps to >> enable >> debugging to the i2c driver to see if it reports anything useful. Even >> better might be to connect an i2c bus analyzer to the i2c bus and check >> what is going on. > That's from -ENXIO which is used in only one place in i2c-mpc.c. I'll > enable some debug and see what we get. For the errant readings there was nothing abnormal reported by the driver. For the "No such device or address" I saw "mpc-i2c ffe119000.i2c: No RXAK" which matches up with the -ENXIO return.