On Fri, 21 May 2021 15:48:20 +0200, Jean Delvare wrote: > Looking at the ICH9 datasheet, I see the following description for the > KILL bit (which is what we try to use to reset the SMBus controller): > > "Kills the current host transaction taking place, sets the FAILED > status bit, and asserts the interrupt (or SMI#)." > > At the time the recovery code was written, i2c-i801 was a polling-only > driver, interrupts were not supported, so asserting the interrupt had > no effect. Now that the driver does support interrupts, this would call > i801_isr(), right? I modified the i2c-i801 driver to simulate a timeout once in a while, and I can confirm that i801_isr() gets called when the KILL bit is set. > So my theory is that our attempt to kill a timed-out byte-by-byte block > transaction triggers an interrupt, which calls in i801_isr() with the > SMBHSTSTS_BYTE_DONE bit set. This in turn causes i801_isr_byte_done() > to be called while we are absolutely not ready nor even supposed to > process the next data byte. I can also report that I'm reproducing the bug reported by syzbot when this happens, by running decode-dimms with my modified i2c-i801 driver. I have 4 SPD EEPROMs instantiated for my DIMMs, so decode-dimms triggers a lot of calls to i2c_smbus_read_i2c_block_data(). > I guess we should clear SMBHSTSTS_BYTE_DONE before issuing a > SMBHSTCNT_KILL. Alternatively we could add a check at the beginning of > i801_isr() to bail out immediately if SMBHSTCNT_KILL is set. While > possibly more robust, this approach has the drawback of increasing the > processing time of all interrupts, even standard/legitimate ones. So > maybe just clearing SMBHSTSTS_BYTE_DONE is more reasonable. Something > like: > > --- linux-5.11.orig/drivers/i2c/busses/i2c-i801.c > +++ linux-5.11/drivers/i2c/busses/i2c-i801.c > @@ -393,6 +393,8 @@ static int i801_check_post(struct i801_p > dev_err(&priv->pci_dev->dev, "Transaction timeout\n"); > /* try to stop the current command */ > dev_dbg(&priv->pci_dev->dev, "Terminating the current operation\n"); > + /* Clear BYTE_DONE so as to not confuse i801_isr() */ > + outb_p(SMBHSTSTS_BYTE_DONE, SMBHSTSTS(priv)); > outb_p(inb_p(SMBHSTCNT(priv)) | SMBHSTCNT_KILL, > SMBHSTCNT(priv)); > usleep_range(1000, 2000); I ended up with a different approach. Clearing BYTE_DONE would possibly avoid the problem immediately at hand, however the interrupt would still be triggered, even for other (not byte-by-byte) transactions, causing us to fiddle with the status register. I don't think it's needed, and it could have unintended side effects. So instead I tried the alternative approach of checking the KILL bit in i801_isr() and returning immediately if it is set. While it does work, I noticed that i801_isr() is in fact called in a loop for the whole duration of the usleep_range(1000, 2000) before we clear the KILL bit. While better than an out-of-bounds memory access, an interrupt flood is still not ideal. > I must say I wonder why SMBHSTCNT_KILL generates an interrupt in the > first place, I can't see who would need this. Which made me think... The interrupt is being generated because we *ask* for it. It doesn't have to be generated if we aren't interested in it. So the fix would be very simple: don't bother preserving the other bits in HST_CNT, they will be set properly again for the next transaction anyway, only set the KILL bit to 1. In particular don't preserve the INTREN bit, so no interrupt will be generated. @@ -395,11 +401,9 @@ static int i801_check_post(struct i801_p dev_err(&priv->pci_dev->dev, "Transaction timeout\n"); /* try to stop the current command */ dev_dbg(&priv->pci_dev->dev, "Terminating the current operation\n"); - outb_p(inb_p(SMBHSTCNT(priv)) | SMBHSTCNT_KILL, - SMBHSTCNT(priv)); + outb_p(SMBHSTCNT_KILL, SMBHSTCNT(priv)); usleep_range(1000, 2000); - outb_p(inb_p(SMBHSTCNT(priv)) & (~SMBHSTCNT_KILL), - SMBHSTCNT(priv)); + outb_p(0, SMBHSTCNT(priv)); /* Check if it worked */ status = inb_p(SMBHSTSTS(priv)); This works for me. If there are no objections I'll post a proper patch. -- Jean Delvare SUSE L3 Support