On 2015-10-21 09:21, Peter Rosin wrote: > On 2015-10-20 15:27, Ludovic Desroches wrote: >> On Mon, Oct 19, 2015 at 12:49:03PM +0200, Peter Rosin wrote: >>> On 2015-10-19 10:51, Ludovic Desroches wrote: >>>> Hi Peter, >>>> >>>> On Fri, Oct 16, 2015 at 11:08:42AM +0200, Peter Rosin wrote: >>>>> On 2015-10-16 01:47, Peter Rosin wrote: >>>>>> On 2015-10-14 07:43, Ludovic Desroches wrote: >>>>>>> On Tue, Oct 13, 2015 at 08:01:34PM +0200, Peter Rosin wrote: >>>>>>>> On 2015-10-13 18:47, Cyrille Pitchen wrote: >>>>>>>>> Le 13/10/2015 17:19, Peter Rosin a écrit : >>>>>>>>>> On 2015-10-13 16:21, Ludovic Desroches wrote: >>>> >>>> [...] >>>> >>>>>> I have started to get this when I run with this patch: >>>>>> >>>>>> [ 73.310000] at91_i2c f0014000.i2c: RXRDY still set! >>>>>> [ 198.200000] at91_i2c f0014000.i2c: RXRDY still set! >>>>>> [ 509.880000] at91_i2c f0014000.i2c: RXRDY still set! >>>>>> [ 615.750000] at91_i2c f0014000.i2c: RXRDY still set! >>>>>> [ 617.750000] at91_i2c f0014000.i2c: RXRDY still set! >>>>>> [ 1766.640000] at91_i2c f0014000.i2c: RXRDY still set! >>>>>> [ 2035.380000] at91_i2c f0014000.i2c: RXRDY still set! >>>>>> [ 2227.190000] at91_i2c f0014000.i2c: RXRDY still set! >>>>>> [ 2313.100000] at91_i2c f0014000.i2c: RXRDY still set! >>>>>> >>>>>> My USB serial dongle was hung which was why I didn't notice until just now. >>>>>> >>>>>> This is probably not when communication with the eeprom though, and certainly not >>>>>> writing to it, but perhpaps when polling the temperature (using the jc42 driver). >>>>>> I'll investigate further in the morning to see if I can pinpoint it. >>>>> >>>>> Yep, it's the jc42 accesses that triggers this (to the same chip as the >>>>> eeprom, but a different block of transistors I suppose). >>>>> >>>>> Looking at the i2c bus, the accesses normally go like this: >>>>> >>>>> [0.000000] S 0x18 W 0x05 S 0x18 R 0xc1 0xbe NACK P >>>>> [0.000521] S 0x18 W 0x04 S 0x18 R 0x00 0x00 NACK P >>>>> [0.001024] S 0x18 W 0x03 S 0x18 R 0x00 0x00 NACK P >>>>> [0.001524] S 0x18 W 0x02 S 0x18 R 0x00 0x00 NACK P >>>>> [0.196991] S 0x18 W 0x05 S 0x18 R 0xc1 0xbe NACK P >>>>> [0.197514] S 0x18 W 0x04 S 0x18 R 0x00 0x00 NACK P >>>>> [0.198019] S 0x18 W 0x03 S 0x18 R 0x00 0x00 NACK P >>>>> [0.198520] S 0x18 W 0x02 S 0x18 R 0x00 0x00 NACK P >>>>> >>>>> I.e. chunks of four transfers every ~200 ms (I removed the 1Hz rate >>>>> limiter in the jc42 driver to get more frequent incidents). >>>>> >>>>> But when the diagnostic (RXRDY still set!) is output it continues >>>>> with this: >>>>> >>>>> [0.399755] S 0x18 W 0x05 S 0x18 R 0xc1 0xbe NACK P >>>>> [0.404998] S 0x18 W 0x04 S 0x18 R 0x00 0x00 NACK P >>>>> [0.405508] S 0x18 W 0x03 S 0x18 R 0x00 0x00 NACK P >>>>> [0.406008] S 0x18 W 0x02 S 0x18 R 0x00 0x00 NACK P >>>>> >>>>> Notice the ~5 ms delay (about the time it should take to output >>>>> the diagnostic at 115200 baud) after the access to register 0x05 >>>>> at 0.399755. >>>>> >>>>> This is the only thing that I can observe on the bus, so it appears >>>>> to be harmless. >>>>> >>>>> It appears that the i2c access at 0.399755 finds the TWI >>>>> registers in an odd state, but nothing from the access at >>>>> 0.198520 appears to have gone wrong. Is this a race? Anyway, >>>>> the diagnostic is pretty frequent and annoying. printk_once? >>>> >>>> I'll try to reproduce it on my side. The only issue you have is having >>>> the message about RXRDY? I mean no issue with i2c transfers? >>> >>> Exactly, the only issue is the message, the bus looks good and transfers >>> work as they should AFAICT. >>>> It is not the possible bug we had in mind, this bug will prevent the >>>> master device to release the i2c bus. It will stop the transfer but >>>> without sending a stop on the bus. >>> >>> Agreed, this is not about the extra frame caused by the spurious write >>> to the THR register. This is something else. >>> >>> One suspicion is that the driver gets an unexpected irq from its own >>> NACK (the one that it puts out to end the read) and races with the >>> expected interrupt at TXCOMP? >>> >> >> I have discussed with the designer of this IP. So the NACK flag is only >> used for 'real' nack not protocol ones. >> >> Concerning the read issue, it have attached a patch (apply it on top of >> Cyrille's patch). Could you have a try? I have reproduced your issue only one >> time so it's hard for me to tell if it works well or not. > > Since you removed the dev_err call, I don't really know if the underlying > issue is solved or not (if there ever was any). > > Reading RHR in the if at the top of at91_twi_read_next_byte() never happens > for me (I added a witness and left it running overnight), so that > seems like an unrelated change. Ahhh, now I see. Why is it that I have to hit send before I start to realize things? You have moved the RHR read from at91_do_twi_transfer()... > Checking RXRDY before TXCOMP|NACK seems about equal, but what do I know? ...and there is no else before the TXCOMP|NACK, so I added a witness inside the if (...RXRDY) that triggers if TXCOM|NACK is also set. That witness fires at about the same rate as the "RXRDY still set" message did before. For thrills I readded the if (sr & AT91_TWI_RXRDY) check in at91_do_twi_transfer() but that test is no longer firing. So, yes, this is a good fix... > But the "problem" is gone of course, no flood of messages, but I can't tell > if this is just papering over or a fix. ...and you can scratch that comment. Cheers, Peter -- To unsubscribe from this list: send the line "unsubscribe linux-i2c" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html