On 4 July 2011 06:31, Peter Korsgaard <jacmet@xxxxxxxxxx> wrote: >>>>>> "Jean" == Jean Delvare <khali@xxxxxxxxxxxx> writes: > > Hi, > > >> Basically the i2c-ocores xfer function, kicks off an interrupt driven > >> array of i2c_msg transfers and waits 1second for it to complete and > >> just > >> returns -ETIMEDOUT if it times out. The interrupt driven transfer is > >> *not* stopped on a time out and continues referencing the i2c_msg > >> structure. In my > >> case an i2c read the interrupt handler keeps adding bytes to the > >> structure. Unfortunately after the time out the structure was released > >> and it's length field was changed to a large number and the interrupt > >> driver kept happily writing bytes way past the end of the original > >> buffer until the kernel crashed or locked up! > >> > >> Below is a fix that works for me in 2.6.32 which removes the i2c_msg > >> buffer from the interrupt handler and changes the handler to check for > >> it's removal. > > There's been some changes to the ocores drivers since then (mainly > device tree support) - It would be good with a patch that can be applied > to head. Unfortunately the HW I found this bug in runs 2.6.32 and I suspect it's likely to require a fair bit of work to port to head and because of the device tree support I suspect it might be hard to back port the current driver. But perhaps some one can suggest a simple way? > >> > >> diff --git a/drivers/i2c/busses/i2c-ocores.c b/drivers/i2c/busses/i2c-ocores.c > >> index 8dee246..21e57a0 100644 > >> --- a/drivers/i2c/busses/i2c-ocores.c > >> +++ b/drivers/i2c/busses/i2c-ocores.c > >> @@ -137,6 +137,12 @@ static void ocores_process(struct ocores_i2c *i2c) > >> wake_up(&i2c->wait); > >> return; > >> } > >> + if (msg == 0) { > >> + /* Caller has with drawn the request, buffer no longer available */ > >> + i2c->state = STATE_ERROR; > >> + oc_setreg(i2c, OCI2C_CMD, OCI2C_CMD_STOP); > >> + return; > >> + } > > I would prefer to send a stop here. You don't know in what context this > will get called (spurious inteerupts?). I also don't like adding extra > meaning i2c->msg when we have a perfectly fine state variable. I was worried about spurious interrupts as well well but never saw any in practice, probably just because the interrupt is always valid and just detects the termination and cleans up. The thinking behind clearing the buffer pointer is that this dangling pointer is the big danger and immeadiately removing it, by making it invalid, guarantees that if the interrupt checks it will never be writing to released memory which leads to the corruption and panic() calls et cetera. > How about you instead set state to STATE_ERROR in ocores_xfer on > timeouts, and then directly send the stop condition (so not in the > isr). Otherwise you don't have any guarantee that the next transfer > isn't setup by the time the interrupt fires. That occurred to me but I don't know enough about the HW to know what happens if you write another request on top of an already running request. I am happy to try different variations but perhaps some one has a more informed view of what we can assume about the HWs behaviour. There is also the race condition where the interrupt routine ocores_process() increments the i2c-msg (line 146) and the zero-ing of i2c->msg in the time out code. But that would be much rarer than the crash in the original code. Is there a way to prevent interrupts on any CPU so we can set the i2c->state (and possibly clear the i2c->msg) safely or can we use local_irq_save()/local_irq_restore() with out introducing a spinlock or something to prevent other CPUs from reading and changing the i2c->state/ i2c->msg variable via an interrupt? > > >> > >> /* error? */ > >> if (stat & OCI2C_STAT_ARBLOST) { > >> @@ -223,8 +229,13 @@ static int ocores_xfer(struct i2c_adapter *adap, > >> struct i2c_msg *msgs, int num) > >> if (wait_event_timeout(i2c->wait, (i2c->state == STATE_ERROR) || > >> (i2c->state == STATE_DONE), HZ)) > >> return (i2c->state == STATE_DONE) ? num : -EIO; > >> - else > >> + else { > >> + printk(KERN_NOTICE > > WARNING seems more suitable here. > > >> + "ocores_xfer() i2c transfer to %d-%#x timed out\n", > >> + i2c_adapter_id(adap), msgs->addr); > >> + i2c->msg = 0; /* remove the caller's request which > >> will be re-used */ > >> return -ETIMEDOUT; > >> + } > >> } > > -- > Bye, Peter Korsgaard > Sure. If people approve I can submit a patch which uses the state but introduces a spinlock to guard access to the i2c->msg access. Looking at the driver some more I think it's unsafe to have the interrupt and the user space playing with the same data with out a lock as much as I would like to avoid the extra lock / code. Either we can split the data into process context only modified and interrupt only modified areas or a lock is introduced. The original patch was the smallest way to avoid the issue I could see but isn't perfect, just reduces the chance of a corruption to a very small but non-zero level. Perhaps just a spinlock on changes to i2c->msg or i2c->state? Andrew -- To unsubscribe from this list: send the line "unsubscribe linux-i2c" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html