On 8/03/21 1:31 pm, Guenter Roeck wrote: > On 3/7/21 2:52 PM, Chris Packham wrote: >> Hi, >> >> I've got a system using a PowerPC T2080 SoC and among other things has >> an LM81 hwmon chip. >> >> Under a high CPU load we see errant readings from the LM81 as well as >> actual failures. It's the errant readings that cause the most concern >> since we can easily ignore the read errors in our monitoring application >> (although it would be better if they weren't there at all). >> >> I'm able to reproduce this with a test application[0] that artificially >> creates a high CPU load then by repeatedly checking for the all-1s >> values from the LM81 datasheet[1](page 17). The all-1s readings stick >> out as they are obviously higher than the voltage rails that are >> connected and disagree with measurements taken with a multimeter. >> >> Here's the output from my device >> >> [root@linuxbox ~]# cpuload 90& >> [root@linuxbox ~]# (while true; do cat /sys/class/hwmon/hwmon0/in*_input >> | grep '3320\|4383\|6641\|15930\|3586'; sleep 1; done)& >> 3586 >> 3586 >> cat: read error: No such device or address >> cat: read error: No such device or address >> 3320 >> 3320 >> 3586 >> 3586 >> 6641 >> 6641 >> 4383 >> 4383 >> >> Fundamentally I think this is a problem with the fact that the LM81 is >> an SMBus device but the T2080 (and other Freescale SoCs) uses i2c and we >> emulate SMBus. I suspect the errant readings are when we don't get round >> to completing the read within the timeout specified by the SMBus >> specification. Depending on when that happens we either fail the >> transfer or interpret the result as all-1s. >> > That is quite unlikely. Many sensor chips are SMBus chips connected to > i2c busses. It is much more likely that there is a bug in the T2080 i2c driver, > that the chip doesn't like the bulk read command issued through regmap, that > the chip has problems with the i2c bus speed, or that the i2c bus is noisy. Perhaps something gets upset when interrupt processing is delayed because of CPU load. I don't see the problem when there isn't a CPU load so I think that eliminates board issues. > In this context, the "No such device or address" responses are very suspicious. > Those are reported by the i2c driver, not by the hwmon driver, and suggest > that the chip did not respond to a read request. Maybe it helps to enable > debugging to the i2c driver to see if it reports anything useful. Even > better might be to connect an i2c bus analyzer to the i2c bus and check > what is going on. That's from -ENXIO which is used in only one place in i2c-mpc.c. I'll enable some debug and see what we get. > > Guenter