Re: [PATCH 2/2] i2c: omap: fix "Too much work in one IRQ" irq handling

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

(please, never top-post)

On Sat, Nov 15, 2014 at 05:37:41AM +0300, Alexander Kochetkov wrote:
> Hi Felipe,
> 
> Initially you made the change (66b9298878742f08cb6e79b7c7d5632d782fd1e1):
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/i2c/busses/i2c-omap.c?id=66b9298878742f08cb6e79b7c7d5632d782fd1e1
> 
>  		dev_dbg(dev->dev, "IRQ (ISR = 0x%04x)\n", stat);
> 
>  		if (count++ == 100) {
>  			dev_warn(dev->dev, "Too much work in one IRQ\n");
> -			break;
> +			omap_i2c_complete_cmd(dev, err);
> +			return IRQ_HANDLED;
>  		}
> 
> 
> Than that change transformed into (4a7ec4eda58269a507501f240955d99312fdfd5f):
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/i2c/busses/i2c-omap.c?id=4a7ec4eda58269a507501f240955d99312fdfd5f
> 
> @@ -853,24 +853,21 @@ omap_i2c_isr(int this_irq, void *dev_id)
>  		dev_dbg(dev->dev, "IRQ (ISR = 0x%04x)\n", stat);
>  		if (count++ == 100) {
>  			dev_warn(dev->dev, "Too much work in one IRQ\n");
> -			omap_i2c_complete_cmd(dev, err);
> -			return IRQ_HANDLED;
> +			goto out;
>  		}
> 
> 
> +out:
> +	omap_i2c_complete_cmd(dev, err);
>  	return IRQ_HANDLED;

look how in both of these commits, omap_i2c_complete_cmd() is called as
it was before they existed. So, at a minimum, you're blaming the wrong
commit.

The commit which changed that was commit 0bdfe0c (i2c: omap: sanitize
exit path) and note how the behavior is still the same because when we
reach our counter, we will 'break' instead of 'goto out'. So, in all of
this, omap_i2c_complete_cmd() is still called just as it was before all
three commits referenced in this thread existed.

If there is a bug (which I'm not yet convinced there is), it already
existed way back.

> As a result we have in master:
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/i2c/busses/i2c-omap.c
> 
> 	do {
> 		bits = omap_i2c_read_reg(dev, OMAP_I2C_IE_REG);
> 		stat = omap_i2c_read_reg(dev, OMAP_I2C_STAT_REG);
> 		stat &= bits;
> 		...
> 
> 		dev_dbg(dev->dev, "IRQ (ISR = 0x%04x)\n", stat);
> 		if (count++ == 100) {
> 			dev_warn(dev->dev, "Too much work in one IRQ\n");
> 			break;
> 		}
>                 ...
> 
> 	} while (stat);
> 
> 	omap_i2c_complete_cmd(dev, err);
> 
> out:
> 	spin_unlock_irqrestore(&dev->lock, flags);

how can this be a result of the two commits you pointed above ? In which
world can the two commits you referenced result in this placement for
the 'out' label ?

'out' was moved by commit 0bdfe0c (i2c: omap: sanitize exit path), but
that still guaranteed that omap_i2c_complete_cmd() was going to be
called just as before. The logic hasn't changed, actually.

> While original code was:
> 
> {
> 	bits = omap_i2c_read_reg(dev, OMAP_I2C_IE_REG);
> 	while ((stat = (omap_i2c_read_reg(dev, OMAP_I2C_STAT_REG))) & bits) {
> 		dev_dbg(dev->dev, "IRQ (ISR = 0x%04x)\n", stat);
> 		if (count++ == 100) {
> 			dev_warn(dev->dev, "Too much work in one IRQ\n");
> 			break;
> 		}
> 
> 		err = 0;
> complete:
>                 ...
> 
> 		if (stat & (OMAP_I2C_STAT_ARDY | OMAP_I2C_STAT_NACK |
> 					OMAP_I2C_STAT_AL)) {
> 			...
> 			omap_i2c_complete_cmd(dev, err);
> 			return IRQ_HANDLED;
> 		}
> 		...
> 	}
> 	return count ? IRQ_HANDLED : IRQ_NONE;
> }
> 
> 
> > how ? This is an interesting bug which deserves further explanation.
> 
> Look at the loops above, and at the omap_i2c_complete_cmd:
> 
> static inline void
> omap_i2c_complete_cmd(struct omap_i2c_dev *dev, u16 err)
> {
> 	dev->cmd_err |= err;
> 	complete(&dev->cmd_complete);
> }
> 
> You can see, loop will be aborted if counter reached 100. Final state
> of transfer depends on values stored in the 'err' and 'dev->cmd_err'.
> If 'err' and 'dev->cmd_err' are zero, than transfer would be aborted
> with status 0.

right, this is the same as it was before. If count reaches 100 we will
omap_i2c_complete_cmd().

> > quite frankly, this looks *very* wrong. It creates the possibility
> > for us never completing a command which would cause several
> > timeouts.
> 
> Pre 66b9298878742f08cb6e79b7c7d5632d782fd1e1 omap-i2c.c version didn't
> cause deadlocks.

which deadlock are you talking about ? How do you trigger it ? Where are
the lockup debug traces ?

> > How have you tested this and how have you figured this was the actual
> > bug ? Based on commit log not matching the patch body (which 'original
> > logic' are you talking about ?), I have to NAK this patch.
> 
> Well, I tried to debug I2C ISR hang issue (thats another question I
> want to discuss later) using output to console. I places few dev_warn

if you found a bug with the ISR, why discuss it later ? How can I
understand if you found a real bug without proper logs ?

> (not dev_dbg) messages into ISR (like got NACK, got ARDY and so on)
> and got "Too much work  in one IRQ" with incorrect booting.

what do you mean by incorrect booting ?

regards

-- 
balbi

Attachment: signature.asc
Description: Digital signature


[Index of Archives]     [Linux GPIO]     [Linux SPI]     [Linux Hardward Monitoring]     [LM Sensors]     [Linux USB Devel]     [Linux Media]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux