Re: [PATCH] i2c: at91: fix write transfers by clearing pending interrupt first

Peter Rosin <peda@xxxxxxxxxxxxxx> · Wed, 21 Oct 2015 09:21:42 +0200

On 2015-10-20 15:27, Ludovic Desroches wrote:
> On Mon, Oct 19, 2015 at 12:49:03PM +0200, Peter Rosin wrote:
>> On 2015-10-19 10:51, Ludovic Desroches wrote:
>>> Hi Peter,
>>>
>>> On Fri, Oct 16, 2015 at 11:08:42AM +0200, Peter Rosin wrote:
>>>> On 2015-10-16 01:47, Peter Rosin wrote:
>>>>> On 2015-10-14 07:43, Ludovic Desroches wrote:
>>>>>> On Tue, Oct 13, 2015 at 08:01:34PM +0200, Peter Rosin wrote:
>>>>>>> On 2015-10-13 18:47, Cyrille Pitchen wrote:
>>>>>>>> Le 13/10/2015 17:19, Peter Rosin a écrit :
>>>>>>>>> On 2015-10-13 16:21, Ludovic Desroches wrote:
>>>
>>> [...]
>>>
>>>>> I have started to get this when I run with this patch:
>>>>>
>>>>> [   73.310000] at91_i2c f0014000.i2c: RXRDY still set!
>>>>> [  198.200000] at91_i2c f0014000.i2c: RXRDY still set!
>>>>> [  509.880000] at91_i2c f0014000.i2c: RXRDY still set!
>>>>> [  615.750000] at91_i2c f0014000.i2c: RXRDY still set!
>>>>> [  617.750000] at91_i2c f0014000.i2c: RXRDY still set!
>>>>> [ 1766.640000] at91_i2c f0014000.i2c: RXRDY still set!
>>>>> [ 2035.380000] at91_i2c f0014000.i2c: RXRDY still set!
>>>>> [ 2227.190000] at91_i2c f0014000.i2c: RXRDY still set!
>>>>> [ 2313.100000] at91_i2c f0014000.i2c: RXRDY still set!
>>>>>
>>>>> My USB serial dongle was hung which was why I didn't notice until just now.
>>>>>
>>>>> This is probably not when communication with the eeprom though, and certainly not
>>>>> writing to it, but perhpaps when polling the temperature (using the jc42 driver).
>>>>> I'll investigate further in the morning to see if I can pinpoint it.
>>>>
>>>> Yep, it's the jc42 accesses that triggers this (to the same chip as the
>>>> eeprom, but a different block of transistors I suppose).
>>>>
>>>> Looking at the i2c bus, the accesses normally go like this:
>>>>
>>>> [0.000000] S 0x18 W 0x05 S 0x18 R 0xc1 0xbe NACK P
>>>> [0.000521] S 0x18 W 0x04 S 0x18 R 0x00 0x00 NACK P
>>>> [0.001024] S 0x18 W 0x03 S 0x18 R 0x00 0x00 NACK P
>>>> [0.001524] S 0x18 W 0x02 S 0x18 R 0x00 0x00 NACK P
>>>> [0.196991] S 0x18 W 0x05 S 0x18 R 0xc1 0xbe NACK P
>>>> [0.197514] S 0x18 W 0x04 S 0x18 R 0x00 0x00 NACK P
>>>> [0.198019] S 0x18 W 0x03 S 0x18 R 0x00 0x00 NACK P
>>>> [0.198520] S 0x18 W 0x02 S 0x18 R 0x00 0x00 NACK P
>>>>
>>>> I.e. chunks of four transfers every ~200 ms (I removed the 1Hz rate
>>>> limiter in the jc42 driver to get more frequent incidents).
>>>>
>>>> But when the diagnostic (RXRDY still set!) is output it continues
>>>> with this:
>>>>
>>>> [0.399755] S 0x18 W 0x05 S 0x18 R 0xc1 0xbe NACK P
>>>> [0.404998] S 0x18 W 0x04 S 0x18 R 0x00 0x00 NACK P
>>>> [0.405508] S 0x18 W 0x03 S 0x18 R 0x00 0x00 NACK P
>>>> [0.406008] S 0x18 W 0x02 S 0x18 R 0x00 0x00 NACK P
>>>>
>>>> Notice the ~5 ms delay (about the time it should take to output
>>>> the diagnostic at 115200 baud) after the access to register 0x05
>>>> at 0.399755.
>>>>
>>>> This is the only thing that I can observe on the bus, so it appears
>>>> to be harmless.
>>>>
>>>> It appears that the i2c access at 0.399755 finds the TWI
>>>> registers in an odd state, but nothing from the access at
>>>> 0.198520 appears to have gone wrong. Is this a race? Anyway,
>>>> the diagnostic is pretty frequent and annoying. printk_once?
>>>
>>> I'll try to reproduce it on my side. The only issue you have is having
>>> the message about RXRDY? I mean no issue with i2c transfers?
>>
>> Exactly, the only issue is the message, the bus looks good and transfers
>> work as they should AFAICT.
>>> It is not the possible bug we had in mind, this bug will prevent the
>>> master device to release the i2c bus. It will stop the transfer but
>>> without sending a stop on the bus.
>>
>> Agreed, this is not about the extra frame caused by the spurious write
>> to the THR register. This is something else.
>>
>> One suspicion is that the driver gets an unexpected irq from its own
>> NACK (the one that it puts out to end the read) and races with the
>> expected interrupt at TXCOMP?
>>
> 
> I have discussed with the designer of this IP. So the NACK flag is only
> used for 'real' nack not protocol ones.
> 
> Concerning the read issue, it have attached a patch (apply it on top of
> Cyrille's patch). Could you have a try? I have reproduced your issue only one
> time so it's hard for me to tell if it works well or not.

Since you removed the dev_err call, I don't really know if the underlying
issue is solved or not (if there ever was any).

Reading RHR in the if at the top of at91_twi_read_next_byte() never happens
for me (I added a witness and left it running overnight), so that
seems like an unrelated change.

Checking RXRDY before TXCOMP|NACK seems about equal, but what do I know?

But the "problem" is gone of course, no flood of messages, but I can't tell
if this is just papering over or a fix.

Cheers,
Peter
--
To unsubscribe from this list: send the line "unsubscribe linux-i2c" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html