On 2017-10-12 13:35, Peter Rosin wrote: > Hi! > > I have encountered an "interesting" bug. It silently corrupts data > and is generally nasty... > > On an I2C bus, driven by the at91 driver and DMA (an Atmel > sama5d31 chip), I have an 256 byte eeprom (NXP SE97BTP). I'm using > Linux v4.13. > > The at24 driver for the eeprom detects that the I2C adapter is > capable of I2C transactions and selects that over SMBus. Reads > are done in 128 byte chunks. However, sometimes there is some > kind of event that disturbs the transactions such that the very > last bit och the very last byte (and the following NACK and STOP) > of such chunks are delayed for a long time (the latest incident > shows 85ms on the scope). That is too long for the eeprom which > is expecting SMBus and times out after about 30 ms. When the > eeprom times out, it just releases the data line so that it is > pulled up high. The I2C driver does not notice this, and when it > finally gets going, it reads a one for the last bit instead of > the expected zero. Since it is the last byte of the read, a NACK > is expected and since the eeprom has timed out the NACK is there. > And the STOP condition also looks normal (expected since it is > generated by the driver itself). So, the driver has not noticed > anything funny. But the data is corrupted. > > I can work around this by disabling the SMBus timeout in the eeprom > with: > > i2cset -f 0 0x18 0x22 0x8100 > > But that is done on a different I2C address (the eeprom is on > address 0x50), since the chip is a combined temperature sensor and > eeprom, and the SMBus timeout bit is of course in a temperature > sensor register. > > HOWEVER, I fail to see how this is limited to my case with this > eeprom. Any SMBus chip with a timeout will suffer the same fate. > The real bug is that this happens without the driver noticing it. > And why is there a 85ms delay in the middle of the last byte? > Sure, I can see why there might be a delay before finishing up > with a STOP condition or between bytes if there needs to be some > DMA setup at some interval, but after the 7th bit of a byte? > > For a lot of transactions on the I2C bus there is no delay before > the last bit. And most of the time there is no delay for the > eeprom reads either; the delay only occurs when it feels like it. > > This does not feel good at all. I added some traces to i2c-at91.c and, AFAIU, it's the call to at91_twi_read_data_dma_callback that sometimes arrives later than desired. Once the callback runs, the transfer completes swiftly. After reading the comments in that driver I suppose the HW holds on to the last data-bit until it knows whether to ACK or NACK in the following bit. But given this, it is questionable if this driver/HW combo can claim support for SMBus. But then again, I expect many things suffer from similar scheduling delays (presumably that's what's going on) so this is probably not special to i2c-at91.c... Since this is probably a very generic problem and I just happened to hit it for the eeprom, I wonder if it would be ok to add a workaround, as below, to the temperature sensor driver part of this chip? (with suitable comments, defines for the constants etc - setting the 0x0080 bit in reg 0x22 disables the SMBus timeout) Cheers, Peter diff --git a/drivers/hwmon/jc42.c b/drivers/hwmon/jc42.c index 1bf22eff0b08..3e72bd8e06d1 100644 --- a/drivers/hwmon/jc42.c +++ b/drivers/hwmon/jc42.c @@ -416,6 +416,13 @@ static int jc42_detect(struct i2c_client *client, struct i2c_board_info *info) if ((cap & 0xff00) || (config & 0xf800)) return -ENODEV; + if (manid == NXP_MANID && (devid & SE97_DEVID_MASK) == SE97_DEVID) { + int smbus = i2c_smbus_read_word_swapped(client, 0x22); + if (smbus < 0) + return -ENODEV; + i2c_smbus_write_word_swapped(client, 0x22, smbus | 0x0080); + } + for (i = 0; i < ARRAY_SIZE(jc42_chips); i++) { struct jc42_chips *chip = &jc42_chips[i]; if (manid == chip->manid &&