On Mon, May 24, 2021 at 11:16:52AM +0200, Jean Delvare wrote: > On Fri, 21 May 2021 15:48:20 +0200, Jean Delvare wrote: > > Looking at the ICH9 datasheet, I see the following description for the > > KILL bit (which is what we try to use to reset the SMBus controller): > > > > "Kills the current host transaction taking place, sets the FAILED > > status bit, and asserts the interrupt (or SMI#)." > > > > At the time the recovery code was written, i2c-i801 was a polling-only > > driver, interrupts were not supported, so asserting the interrupt had > > no effect. Now that the driver does support interrupts, this would call > > i801_isr(), right? > > I modified the i2c-i801 driver to simulate a timeout once in a while, > and I can confirm that i801_isr() gets called when the KILL bit is set. > > > So my theory is that our attempt to kill a timed-out byte-by-byte block > > transaction triggers an interrupt, which calls in i801_isr() with the > > SMBHSTSTS_BYTE_DONE bit set. This in turn causes i801_isr_byte_done() > > to be called while we are absolutely not ready nor even supposed to > > process the next data byte. > > I can also report that I'm reproducing the bug reported by syzbot when > this happens, by running decode-dimms with my modified i2c-i801 driver. > I have 4 SPD EEPROMs instantiated for my DIMMs, so decode-dimms > triggers a lot of calls to i2c_smbus_read_i2c_block_data(). > > > I guess we should clear SMBHSTSTS_BYTE_DONE before issuing a > > SMBHSTCNT_KILL. Alternatively we could add a check at the beginning of > > i801_isr() to bail out immediately if SMBHSTCNT_KILL is set. While > > possibly more robust, this approach has the drawback of increasing the > > processing time of all interrupts, even standard/legitimate ones. So > > maybe just clearing SMBHSTSTS_BYTE_DONE is more reasonable. Something > > like: > > > > --- linux-5.11.orig/drivers/i2c/busses/i2c-i801.c > > +++ linux-5.11/drivers/i2c/busses/i2c-i801.c > > @@ -393,6 +393,8 @@ static int i801_check_post(struct i801_p > > dev_err(&priv->pci_dev->dev, "Transaction timeout\n"); > > /* try to stop the current command */ > > dev_dbg(&priv->pci_dev->dev, "Terminating the current operation\n"); > > + /* Clear BYTE_DONE so as to not confuse i801_isr() */ > > + outb_p(SMBHSTSTS_BYTE_DONE, SMBHSTSTS(priv)); > > outb_p(inb_p(SMBHSTCNT(priv)) | SMBHSTCNT_KILL, > > SMBHSTCNT(priv)); > > usleep_range(1000, 2000); > > I ended up with a different approach. Clearing BYTE_DONE would possibly > avoid the problem immediately at hand, however the interrupt would > still be triggered, even for other (not byte-by-byte) transactions, > causing us to fiddle with the status register. I don't think it's > needed, and it could have unintended side effects. > > So instead I tried the alternative approach of checking the KILL bit in > i801_isr() and returning immediately if it is set. While it does work, > I noticed that i801_isr() is in fact called in a loop for the whole > duration of the usleep_range(1000, 2000) before we clear the KILL bit. > While better than an out-of-bounds memory access, an interrupt flood is > still not ideal. > > > I must say I wonder why SMBHSTCNT_KILL generates an interrupt in the > > first place, I can't see who would need this. > > Which made me think... The interrupt is being generated because we > *ask* for it. It doesn't have to be generated if we aren't interested > in it. So the fix would be very simple: don't bother preserving the > other bits in HST_CNT, they will be set properly again for the next > transaction anyway, only set the KILL bit to 1. In particular don't > preserve the INTREN bit, so no interrupt will be generated. > > @@ -395,11 +401,9 @@ static int i801_check_post(struct i801_p > dev_err(&priv->pci_dev->dev, "Transaction timeout\n"); > /* try to stop the current command */ > dev_dbg(&priv->pci_dev->dev, "Terminating the current operation\n"); > - outb_p(inb_p(SMBHSTCNT(priv)) | SMBHSTCNT_KILL, > - SMBHSTCNT(priv)); > + outb_p(SMBHSTCNT_KILL, SMBHSTCNT(priv)); > usleep_range(1000, 2000); > - outb_p(inb_p(SMBHSTCNT(priv)) & (~SMBHSTCNT_KILL), > - SMBHSTCNT(priv)); > + outb_p(0, SMBHSTCNT(priv)); > > /* Check if it worked */ > status = inb_p(SMBHSTSTS(priv)); > > This works for me. If there are no objections I'll post a proper patch. No objection from me, thanks for conducting a research and developing the fix! Feel free to add mine Acked-by: Andy Shevchenko <andriy.shevchenko@xxxxxxxxxxxxxxx> to whatever solution you consider the best. -- With Best Regards, Andy Shevchenko