On Fri, 14 Jan 2022 at 14:01, Heyi Guo <guoheyi@xxxxxxxxxxxxxxxxx> wrote: > > Hi Joel, > > > 在 2022/1/11 下午6:51, Joel Stanley 写道: > > On Tue, 11 Jan 2022 at 07:52, Heyi Guo <guoheyi@xxxxxxxxxxxxxxxxx> wrote: > >> Hi all, > >> > >> Any comments? > >> > >> Thanks, > >> > >> Heyi > >> > >> 在 2022/1/9 下午9:26, Heyi Guo 写道: > >>> The memory will be freed by the caller if transfer timeout occurs, > >>> then it would trigger kernel panic if the peer device responds with > >>> something after timeout and triggers the interrupt handler of aspeed > >>> i2c driver. > >>> > >>> Set the msgs pointer to NULL to avoid invalid memory reference after > >>> timeout to fix this potential kernel panic. > > Thanks for the patch. How did you discover this issue? Do you have a > > test I can run to reproduce the crash? > > We are using one i2c channel to communicate with another MCU by > implementing user space SSIF protocol, and the MCU may not respond in > time if it is busy. If it responds after timeout occurs, it will trigger > below kernel panic: > Thanks for the details. It looks like you've done some testing of this, which is good. > After applying this patch, we'll get below warning instead: > > "bus in unknown state. irq_status: 0x%x\n" Given we get to here in the irq handler, we've done these two tests: - aspeed_i2c_is_irq_error() - the state is not INACTIVE or PENDING but there's no buffer ready for us to use. So what has triggered the IRQ in this case? Do you have a record of the irq status bits? I am wondering if the driver should know that the transaction has timed out, instead of detecting this unknown state. > > Can you provide a Fixes tag? > > I think the bug was introduced by the first commit of this file :( > > f327c686d3ba44eda79a2d9e02a6a242e0b75787 > > > > > > Do other i2c master drivers do this? I took a quick look at the meson > > driver and it doesn't appear to clear it's pointer to msgs. > > It is hard to say. It seems other drivers have some recover scheme like > aborting the transfer, or loop each messages in process context and > don't do much in IRQ handler, which may disable interrupts or not retain > the buffer pointer before returning timeout. I think your change is okay to go in as it fixes the crash, but first I want to work out if there's some missing handling of a timeout condition that we should add as well. > > Thanks, > > Heyi > > > > > >>> Signed-off-by: Heyi Guo <guoheyi@xxxxxxxxxxxxxxxxx> > >>> > >>> ------- > >>> > >>> Cc: Brendan Higgins <brendanhiggins@xxxxxxxxxx> > >>> Cc: Benjamin Herrenschmidt <benh@xxxxxxxxxxxxxxxxxxx> > >>> Cc: Joel Stanley <joel@xxxxxxxxx> > >>> Cc: Andrew Jeffery <andrew@xxxxxxxx> > >>> Cc: Philipp Zabel <p.zabel@xxxxxxxxxxxxxx> > >>> Cc: linux-i2c@xxxxxxxxxxxxxxx > >>> Cc: openbmc@xxxxxxxxxxxxxxxx > >>> Cc: linux-arm-kernel@xxxxxxxxxxxxxxxxxxx > >>> Cc: linux-aspeed@xxxxxxxxxxxxxxxx > >>> --- > >>> drivers/i2c/busses/i2c-aspeed.c | 5 +++++ > >>> 1 file changed, 5 insertions(+) > >>> > >>> diff --git a/drivers/i2c/busses/i2c-aspeed.c b/drivers/i2c/busses/i2c-aspeed.c > >>> index 67e8b97c0c950..3ab0396168680 100644 > >>> --- a/drivers/i2c/busses/i2c-aspeed.c > >>> +++ b/drivers/i2c/busses/i2c-aspeed.c > >>> @@ -708,6 +708,11 @@ static int aspeed_i2c_master_xfer(struct i2c_adapter *adap, > >>> spin_lock_irqsave(&bus->lock, flags); > >>> if (bus->master_state == ASPEED_I2C_MASTER_PENDING) > >>> bus->master_state = ASPEED_I2C_MASTER_INACTIVE; > >>> + /* > >>> + * All the buffers may be freed after returning to caller, so > >>> + * set msgs to NULL to avoid memory reference after freeing. > >>> + */ > >>> + bus->msgs = NULL; > >>> spin_unlock_irqrestore(&bus->lock, flags); > >>> > >>> return -ETIMEDOUT;