Hi Robert, On Tue, Jun 06, 2023 at 07:40:15PM +0000, Robert Hancock wrote: > On Tue, 2023-06-06 at 21:24 +0200, Andi Shyti wrote: > > Hi Robert, > > > > On Tue, Jun 06, 2023 at 12:25:58PM -0600, Robert Hancock wrote: > > > In xiic_process, it is possible that error events such as > > > arbitration > > > lost or TX error can be raised in conjunction with other interrupt > > > flags > > > such as TX FIFO empty or bus not busy. Error events result in the > > > controller being reset and the error returned to the calling > > > request, > > > but the function could potentially try to keep handling the other > > > events, such as by writing more messages into the TX FIFO. Since > > > the > > > transaction has already failed, this is not helpful and will just > > > cause > > > issues. > > > > what kind of issues? > > > > The one I ran into is what I alluded to further down, where that > WARN_ON was triggering repeatedly, which ended up flooding the kernel > log and causing the device's watchdog timer to timeout. I'm not sure > what other forms of havoc might ensue, other than "nothing good".. > > > > This problem has been present ever since: > > > > > > commit 7f9906bd7f72 ("i2c: xiic: Service all interrupts in isr") > > > > > > which allowed non-error events to be handled after errors, but > > > became > > > more obvious after: > > > > > > commit 743e227a8959 ("i2c: xiic: Defer xiic_wakeup() and > > > __xiic_start_xfer() in xiic_process()") > > > > > > which reworked the code to add a WARN_ON which triggers if both the > > > xfer_more and wakeup_req flags were set, since this combination is > > > not supposed to happen, but was occurring in this scenario. > > > > > > Skip further interrupt handling after error flags are detected to > > > avoid > > > this problem. > > > > > > Fixes: 7f9906bd7f72 ("i2c: xiic: Service all interrupts in isr") > > > Signed-off-by: Robert Hancock <robert.hancock@xxxxxxxxxx> > > > > please add: > > > > Cc: <stable@xxxxxxxxxxxxxxx> # v4.3+ > > > > Can add for a v2 - although with the Fixes tag it would most likely be > picked up for stable anyway.. It's just a courtesy to stable maintainers :) > > > --- > > > drivers/i2c/busses/i2c-xiic.c | 2 ++ > > > 1 file changed, 2 insertions(+) > > > > > > diff --git a/drivers/i2c/busses/i2c-xiic.c > > > b/drivers/i2c/busses/i2c-xiic.c > > > index 8a3d9817cb41..ee6edc963dea 100644 > > > --- a/drivers/i2c/busses/i2c-xiic.c > > > +++ b/drivers/i2c/busses/i2c-xiic.c > > > @@ -721,6 +721,8 @@ static irqreturn_t xiic_process(int irq, void > > > *dev_id) > > > wakeup_req = 1; > > > wakeup_code = STATE_ERROR; > > > } > > > + /* don't try to handle other events */ > > > + goto out; > > > > why don't we have goto's after every irq evaluation but only > > here? Do the issues you mentioned happen olny in this particular > > error case? > > > > As far as I can tell, yes. For example you could legitimately have > XIIC_INTR_TX_EMPTY_MASK and XIIC_INTR_BNB_MASK both being handled. The > error case is special as we end up resetting the controller, so it > doesn't make sense to try to continue with the rest of the transaction > while also completing it with an error. I think the patch is correct and I will ack it: Acked-by: Andi Shyti <andi.shyti@xxxxxxxxxx> I think, though, that this needs a proper fix and testing, in order to cover all the possible combinations. The scenario you highlighted is indeed one, but not only, potential situation that could arise. Can I just ask you to write a bit more in the comment to highlight the possible failure? Thanks a lot, Andi > > Thanks, > > Andi > > > > > } > > > if (pend & XIIC_INTR_RX_FULL_MASK) { > > > /* Receive register/FIFO is full */ > > > -- > > > 2.40.1 > > > > > -- > Robert Hancock <robert.hancock@xxxxxxxxxx>