Re: [PATCH] Fix Bug for cadence i2c interrupt overrunning buffer

Andrew Worsley <amworsley@xxxxxxxxx> · Thu, 18 Jan 2018 13:20:45 +1100

On 8 December 2017 at 09:15, Andrew Worsley <amworsley@xxxxxxxxx> wrote:
> Thanks. I regularly see the warning my patch prints out  on boot up as then
> there is a big pull of random data from the TPM.

   Just an update on this bug - I did a i2c capture with pulseview on
a boot that included this issue.

> On 8 Dec 2017 12:30 am, "Harini Katakam" <harinikatakamlinux@xxxxxxxxx>
> wrote:
>>
>> Hi Wolfram and Andrew,
>>
>> On Thu, Dec 7, 2017 at 4:19 PM, Wolfram Sang <wsa@xxxxxxxxxxxxx> wrote:
>> >
>> >> > I attach a patch of the changes to the i2c driver showing my debug in
>> >> > this driver.
>> >> > As I said I am happy to send more verbose bug output - I have about
>> >> > 36
>> >> > odd runs with 9 occurences of the bug.
>> >> > Once it happen twice in the one run
>> >> >
>> >> > I include the crashing line from the runs where the bug occured (I
>> >> > added more diagnostics as the runs progressed):
>> >>
>> >> I'll check with out HW team as well and check your flow to
>> >> see if we can reproduce it; will also check if the existing errata
>> >> has larger scope.
>> >> I'll let you know if we require further assistance with debug
>> >> next week.
>> >
>> > Any news here?
>>
>> Sorry for the delay.
>> I've checked the known errata and confirmed that it wont
>> affect this use case. And we haven't been able to hit this
>> issue on our HW (with eeprom slave devices).
>> I will add my colleague looking into this further to debug.
>>
>> Regards,
>> Harini

I suspect a well behaved i2c slave may never show the issue. But the
i2c cadence doesn't handle bad slaves safely
so I suggest that length check is needed to avoid buffer kernel
crashes due to mis-behaving i2c hardware.

I got a board that this issue was *very* bad to the point that the
TPM_RANDOM requests repeatedly returned the same data and so were
failing the sanity tests. So I captured the i2c transactions using
pulseview and very bad behaviour from the TPM. The problem disappears
at 100KHz and very reliably occurs at 400k on this individual board so
I believe this TPM is failing to keep up with the i2c clock and
screwying everything up.

What I am concerned about is that the i2c-cadence driver is almost
always carrying on and returning data with out errors.

In particular I see repeated starts and long periods where the data
line is held low and NO Bus Arbitration or Time out errors. In
particular I can get repeated TPM_RANDOM requests to the TPM returning
the exact same data until another i2c operation to a different chip
(RTC) which works fine at this 400kHz.

So to summarize most (all?) other boards work at this 400kHz speed so
I definitely have a marginal TPM chip on this board. But why doesn't
the i2c-cadence HW pick up these faults? The i2c transactions decoded
by pulseview show a
a real mess but i2c-cadence is bravely pulling data out of it.

I will attempt to append a few images of i2c traces - to illustrate
the mess that happens but that the i2c cadence hardware is very rarely
reporting any problems - but returns invalid data. In about 15+ boots
I got one Bus arbitration.

After looking at the i2c traces it is not surprising that the cadence
i2c hardware can become confused and return way more
data than was requested. As the buffer overrun can cause a crash in a
different routine (after the i2c driver has finished) I think it is
prudent to keep some sort of check on the data size returned.

Andrew

read-overview image is the large TPM read I suspect where things go
wrong showing lots of i2c bursts on one transactions with the
clock+data held low between them and actually ends with the data held
low. It is not fixed until *much* later way past the end of that
image.

The read-start image is the very start of this long transaction
showing the read to TPM at address 0x29 - but the pulseview decode is
correctly reporting repeated starts although it looks like the master
isn't detecting it - there are many many occurrences of these sorts of
problems
Attachment:
tpm-random-read-overview.png

Description: PNG image
Attachment:
tpm-random-read-start.png

Description: PNG image