Re: i2c-ocores timeout bug

Andrew Worsley <amworsley@xxxxxxxxx> · Mon, 4 Jul 2011 09:45:20 +1000

On 4 July 2011 06:31, Peter Korsgaard <jacmet@xxxxxxxxxx> wrote:
>>>>>> "Jean" == Jean Delvare <khali@xxxxxxxxxxxx> writes:
>
> Hi,
>
>  >> Basically the i2c-ocores xfer function, kicks off an interrupt driven
>  >> array of i2c_msg transfers and waits 1second for it to complete and
>  >> just
>  >> returns -ETIMEDOUT if it times out. The interrupt driven transfer is
>  >> *not* stopped on a time out and continues referencing the i2c_msg
>  >> structure. In my
>  >> case an i2c read the interrupt handler keeps adding bytes to the
>  >> structure. Unfortunately after the time out the structure was released
>  >> and it's length field was changed to a large number and the interrupt
>  >> driver kept happily writing bytes way past the end of the original
>  >> buffer until the kernel crashed or locked up!
>  >>
>  >> Below is a fix that works for me in 2.6.32 which removes the i2c_msg
>  >> buffer from the interrupt handler and changes the handler to check for
>  >> it's removal.
>
> There's been some changes to the ocores drivers since then (mainly
> device tree support) - It would be good with a patch that can be applied
> to head.

Unfortunately the HW I found this bug in runs 2.6.32 and I suspect
it's likely to require a
fair bit of work to port to head and because of the device tree
support I suspect it might be hard to back port the
current driver. But perhaps some one can suggest a simple way?

>  >>
>  >> diff --git a/drivers/i2c/busses/i2c-ocores.c b/drivers/i2c/busses/i2c-ocores.c
>  >> index 8dee246..21e57a0 100644
>  >> --- a/drivers/i2c/busses/i2c-ocores.c
>  >> +++ b/drivers/i2c/busses/i2c-ocores.c
>  >> @@ -137,6 +137,12 @@ static void ocores_process(struct ocores_i2c *i2c)
>  >> wake_up(&i2c->wait);
>  >> return;
>  >> }
>  >> +       if (msg == 0) {
>  >> +       /* Caller has with drawn the request, buffer no longer available */
>  >> +           i2c->state = STATE_ERROR;
>  >> +           oc_setreg(i2c, OCI2C_CMD, OCI2C_CMD_STOP);
>  >> +           return;
>  >> +       }
>
> I would prefer to send a stop here. You don't know in what context this
> will get called (spurious inteerupts?). I also don't like adding extra
> meaning i2c->msg when we have a perfectly fine state variable.

I was worried about spurious interrupts as well well but never saw any
in practice,
probably just because the interrupt is always valid and just detects
the termination and cleans up.
The thinking behind clearing the buffer pointer is that this dangling
pointer is the big danger and immeadiately
removing it, by making it invalid, guarantees that if the interrupt
checks it will never be writing to released memory
which leads to the corruption and panic() calls et cetera.

> How about you instead set state to STATE_ERROR in ocores_xfer on
> timeouts, and then directly send the stop condition (so not in the
> isr). Otherwise you don't have any guarantee that the next transfer
> isn't setup by the time the interrupt fires.

That occurred to me but I don't know enough about the HW to know what
happens if you write another request on top of an
already running request. I am happy to try different variations but
perhaps some one
has a more informed view of what we can assume about the HWs behaviour.

There is also the race condition where the interrupt routine
ocores_process() increments the i2c-msg (line 146) and the zero-ing of
i2c->msg  in the time out code. But that would be much rarer than the
crash in the original code.

Is there a way to prevent interrupts on any CPU so we can set the
i2c->state (and possibly clear the i2c->msg) safely
or can we use local_irq_save()/local_irq_restore() with out
introducing a spinlock or something to prevent other CPUs from reading
and changing the i2c->state/ i2c->msg variable via an interrupt?

>
>  >>
>  >> /* error? */
>  >> if (stat & OCI2C_STAT_ARBLOST) {
>  >> @@ -223,8 +229,13 @@ static int ocores_xfer(struct i2c_adapter *adap,
>  >> struct i2c_msg *msgs, int num)
>  >> if (wait_event_timeout(i2c->wait, (i2c->state == STATE_ERROR) ||
>  >> (i2c->state == STATE_DONE), HZ))
>  >> return (i2c->state == STATE_DONE) ? num : -EIO;
>  >> -       else
>  >> +       else {
>  >> +               printk(KERN_NOTICE
>
> WARNING seems more suitable here.
>
>  >> +               "ocores_xfer() i2c transfer to %d-%#x timed out\n",
>  >> +                       i2c_adapter_id(adap), msgs->addr);
>  >> +               i2c->msg = 0; /* remove the caller's request which
>  >> will be re-used */
>  >> return -ETIMEDOUT;
>  >> +       }
>  >> }
>
> --
> Bye, Peter Korsgaard
>

Sure.

If people approve I can submit a patch which uses the state but
introduces a spinlock to guard access to the i2c->msg access. Looking
at the driver some more I think it's unsafe to have the interrupt and
the user space playing with the same data with out a lock
as much as I would like to avoid the extra lock / code.

Either we can split the data into process context only modified and
interrupt only modified areas or a lock is introduced. The original
patch was the smallest way to avoid the issue I could see but isn't
perfect, just reduces the chance of a corruption to a very small but
non-zero level.

Perhaps just a spinlock on changes to i2c->msg or i2c->state?

Andrew
--
To unsubscribe from this list: send the line "unsubscribe linux-i2c" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html