On Mon, 2025-02-03 at 14:29 -0600, Eddie James wrote: > > On 2/2/25 21:31, Andrew Jeffery wrote: > > On Fri, 2025-01-31 at 16:29 -0600, Eddie James wrote: > > > For multimaster case, it's conceivable that an interrupt comes > > > in after the transfer times out and attempts to use bus messages > > > that have already been freed by the i2c core. > > This description seems a little vague. Did you hit this case in > > practice? > > > Yes. I no longer have the back trace but it's a null pointer access in > the interrupt handler. We had a certain system that would hit this under > certain conditions and this patch fixed it. > > > I can update the commit message with some more detail. Thanks. > > > > > > > Signed-off-by: Eddie James <eajames@xxxxxxxxxxxxx> > > > --- > > > drivers/i2c/busses/i2c-aspeed.c | 1 + > > > 1 file changed, 1 insertion(+) > > > > > > diff --git a/drivers/i2c/busses/i2c-aspeed.c > > > b/drivers/i2c/busses/i2c-aspeed.c > > > index 1550d3d552aed..e344dcc2233fe 100644 > > > --- a/drivers/i2c/busses/i2c-aspeed.c > > > +++ b/drivers/i2c/busses/i2c-aspeed.c > > > @@ -731,6 +731,7 @@ static int aspeed_i2c_master_xfer(struct > > > i2c_adapter *adap, > > > * master command. > > > */ > > > spin_lock_irqsave(&bus->lock, flags); > > > + bus->msgs = NULL; > > It feels like there's buggy code elsewhere in the driver that this is > > protecting (broken state machine)? I've had a look at the > > aspeed_i2c_master_irq() implementation and can't see what though, as we > > take an early-exit (before indexing into bus->msgs) if bus- > > > master_state is INACTIVE or PENDING. > > Can you be a bit more specific about where the issue lies? > > > I'm sure the state machine isn't perfect, yea. > Right, so I think that's what should be fixed; the explicit states define possible invariants in the implementation. We shouldn't need to test `msgs` to know its value (whether its value is correct should be defined by the current state). > The bad access can happen > like this: if a transfer times out while waiting for an interrupt, the > aspeed_i2c_master_xfer function will either reset the engine or recover > the bus, and exit ETIMEDOUT. Resetting the engine will turn off > interrupts, so we're safe. But recovering the bus doesn't turn off > interrupts, so after the function exits ETIMEDOUT, I assume what happens > is we get the interrupt for the previous transfer and try and access the > messages pointer, which the i2c core has already freed. So what immediately concerns me is there's no RECOVER state in `enum aspeed_i2c_master_state` or the rest of the implementation. We do have the PENDING state, which we don't have in hardware, so there's no reason for RECOVER to be missing, especially since we have the RECOVER state in hardware (I2CD14[22:19] = 0b0011). What do you think of adding that, and testing for it in the interrupt handler? Andrew