On 2015-10-20 15:27, Ludovic Desroches wrote: > On Mon, Oct 19, 2015 at 12:49:03PM +0200, Peter Rosin wrote: >> On 2015-10-19 10:51, Ludovic Desroches wrote: >>> Hi Peter, >>> >>> On Fri, Oct 16, 2015 at 11:08:42AM +0200, Peter Rosin wrote: >>>> On 2015-10-16 01:47, Peter Rosin wrote: >>>>> On 2015-10-14 07:43, Ludovic Desroches wrote: >>>>>> On Tue, Oct 13, 2015 at 08:01:34PM +0200, Peter Rosin wrote: >>>>>>> On 2015-10-13 18:47, Cyrille Pitchen wrote: >>>>>>>> Le 13/10/2015 17:19, Peter Rosin a écrit : >>>>>>>>> On 2015-10-13 16:21, Ludovic Desroches wrote: >>> >>> [...] >>> >>>>> I have started to get this when I run with this patch: >>>>> >>>>> [ 73.310000] at91_i2c f0014000.i2c: RXRDY still set! >>>>> [ 198.200000] at91_i2c f0014000.i2c: RXRDY still set! >>>>> [ 509.880000] at91_i2c f0014000.i2c: RXRDY still set! >>>>> [ 615.750000] at91_i2c f0014000.i2c: RXRDY still set! >>>>> [ 617.750000] at91_i2c f0014000.i2c: RXRDY still set! >>>>> [ 1766.640000] at91_i2c f0014000.i2c: RXRDY still set! >>>>> [ 2035.380000] at91_i2c f0014000.i2c: RXRDY still set! >>>>> [ 2227.190000] at91_i2c f0014000.i2c: RXRDY still set! >>>>> [ 2313.100000] at91_i2c f0014000.i2c: RXRDY still set! >>>>> >>>>> My USB serial dongle was hung which was why I didn't notice until just now. >>>>> >>>>> This is probably not when communication with the eeprom though, and certainly not >>>>> writing to it, but perhpaps when polling the temperature (using the jc42 driver). >>>>> I'll investigate further in the morning to see if I can pinpoint it. >>>> >>>> Yep, it's the jc42 accesses that triggers this (to the same chip as the >>>> eeprom, but a different block of transistors I suppose). >>>> >>>> Looking at the i2c bus, the accesses normally go like this: >>>> >>>> [0.000000] S 0x18 W 0x05 S 0x18 R 0xc1 0xbe NACK P >>>> [0.000521] S 0x18 W 0x04 S 0x18 R 0x00 0x00 NACK P >>>> [0.001024] S 0x18 W 0x03 S 0x18 R 0x00 0x00 NACK P >>>> [0.001524] S 0x18 W 0x02 S 0x18 R 0x00 0x00 NACK P >>>> [0.196991] S 0x18 W 0x05 S 0x18 R 0xc1 0xbe NACK P >>>> [0.197514] S 0x18 W 0x04 S 0x18 R 0x00 0x00 NACK P >>>> [0.198019] S 0x18 W 0x03 S 0x18 R 0x00 0x00 NACK P >>>> [0.198520] S 0x18 W 0x02 S 0x18 R 0x00 0x00 NACK P >>>> >>>> I.e. chunks of four transfers every ~200 ms (I removed the 1Hz rate >>>> limiter in the jc42 driver to get more frequent incidents). >>>> >>>> But when the diagnostic (RXRDY still set!) is output it continues >>>> with this: >>>> >>>> [0.399755] S 0x18 W 0x05 S 0x18 R 0xc1 0xbe NACK P >>>> [0.404998] S 0x18 W 0x04 S 0x18 R 0x00 0x00 NACK P >>>> [0.405508] S 0x18 W 0x03 S 0x18 R 0x00 0x00 NACK P >>>> [0.406008] S 0x18 W 0x02 S 0x18 R 0x00 0x00 NACK P >>>> >>>> Notice the ~5 ms delay (about the time it should take to output >>>> the diagnostic at 115200 baud) after the access to register 0x05 >>>> at 0.399755. >>>> >>>> This is the only thing that I can observe on the bus, so it appears >>>> to be harmless. >>>> >>>> It appears that the i2c access at 0.399755 finds the TWI >>>> registers in an odd state, but nothing from the access at >>>> 0.198520 appears to have gone wrong. Is this a race? Anyway, >>>> the diagnostic is pretty frequent and annoying. printk_once? >>> >>> I'll try to reproduce it on my side. The only issue you have is having >>> the message about RXRDY? I mean no issue with i2c transfers? >> >> Exactly, the only issue is the message, the bus looks good and transfers >> work as they should AFAICT. >>> It is not the possible bug we had in mind, this bug will prevent the >>> master device to release the i2c bus. It will stop the transfer but >>> without sending a stop on the bus. >> >> Agreed, this is not about the extra frame caused by the spurious write >> to the THR register. This is something else. >> >> One suspicion is that the driver gets an unexpected irq from its own >> NACK (the one that it puts out to end the read) and races with the >> expected interrupt at TXCOMP? >> > > I have discussed with the designer of this IP. So the NACK flag is only > used for 'real' nack not protocol ones. > > Concerning the read issue, it have attached a patch (apply it on top of > Cyrille's patch). Could you have a try? I have reproduced your issue only one > time so it's hard for me to tell if it works well or not. Since you removed the dev_err call, I don't really know if the underlying issue is solved or not (if there ever was any). Reading RHR in the if at the top of at91_twi_read_next_byte() never happens for me (I added a witness and left it running overnight), so that seems like an unrelated change. Checking RXRDY before TXCOMP|NACK seems about equal, but what do I know? But the "problem" is gone of course, no flood of messages, but I can't tell if this is just papering over or a fix. Cheers, Peter -- To unsubscribe from this list: send the line "unsubscribe linux-i2c" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html