On 19.09.2017 12:50, Vignesh R wrote: > > > On Monday 18 September 2017 05:31 PM, Claudio Foellmi wrote: >> On 18.09.2017 07:24, Vignesh R wrote: >>> >>> >>> On Saturday 16 September 2017 05:01 AM, Strashko, Grygorii wrote: >>>> >>>> >>>> On 09/14/2017 10:39 AM, Claudio Foellmi wrote: >>>>> A very conservative check for bus activity (to prevent interference >>>>> in multimaster setups) prevented the bus recovery methods from being >>>>> triggered in the case that SDA or SCL was stuck low. >>>>> This defeats the purpose of the recovery mechanism, which was introduced >>>>> for exactly this situation (a slave device keeping SDA pulled down). >>>>> >>>>> Note that bus lockups can persist across reboots. The only other options >>>>> are to reset or power cycle the offending slave device, and many i2c >>>>> slaves do not even have a reset pin. >>>>> >>>>> If we see that one of the lines is low for the entire timeout duration, >>>>> we can actually be sure that there is no other master driving the bus. >>>>> It is therefore save for us to attempt a bus recovery. >>>>> >>>> >>>> Reviewed-by: Grygorii Strashko <grygorii.strashko@xxxxxx> >>>> >>>>> Signed-off-by: Claudio Foellmi <claudio.foellmi@xxxxxxxx> >>>>> --- >>>>> Caveat: It turns out I don't have the hardware to fully test the >>>>> recovery mechanism. My faulty i2c slave device actually pulls down SCL, >>>>> not SDA (so the recovery will not succeed in my case). >>> >>> Maybe, you could detect SCL stuck low case by reading status of SCL line >>> from OMAP_I2C_SYSTEST_REG and then call IP reset (there is nothing much >>> that can be done) instead of bus recovery. >>> >> >> I plan on posting a related patch soon, that will print better messages >> if the generic recovery fails. If SCL is stuck low, I think the best we >> can do is make the problem visible in the kernel log. >> >>>>> But by directly connecting SDA to ground, I could at least make sure >>>>> the recovery function gets called after applying this patch. >>>>> >>> >>> I had seen flood of XRDY & RRDY interrupts as soon as TMODE is set to >>> 0x3 as part of omap_i2c_prepare_recovery() leading to unusable system. >>> Did you observe this behavior on your system? Could you mention the >>> platform on which this experiment done? >>> >> >> So attempting bus recovery is dangerous on some platforms? >> I did not notice any obvious problems (assuming an 'unusable system' >> would be hard to miss), but then again I only have one target to test on. >> I'm working with a TI AM3352, the slave is a NXP NT3H2211 on i2c-1. >> > > I hit a situation where when communicating with a faulty i2c device, the > last transaction on the bus does not end with proper STOP condition on > the i2c bus. Since, STOP condition was not detected by IP, Bus Busy will > remain set even though both SCL and SDA are high. Thus, > omap_i2c_wait_for_bb() function would end up calling bus recovery. And > as soon as TMODE is set to 0x3 and ST_EN to 0x1, there is a flood of > XRDY & RRDY interrupts. > If I understand the code correctly, the problem is that omap_i2c_wait_for_bb_valid does not quite do what the name would suggest: If SDA, SCL and BB are all 1 for a while, it correctly detects that the bus is actually free, but then just sets bb_valid=1 and returns. So now bb_valid is set, even though the value of BB is wrong. Later, omap_i2c_wait_for_bb runs into a timeout because it trusts that BB flag, and triggers the recovery. It would be nice if we could manually set BB to 0 in that case. But from what I've read in the manual, the only way to reset BB is to reset the device. > This spurious irqs can be reproduced easily by setting TMODE to 0x3 and > ST_EN to 0x1 in OMAP_I2C_SYSTEST_REG when both SCL and SDA are high (bus > is idle) even on AM335x. > > So, if you are not seeing irq flood when SCL/SDA is stuck low, then > maybe its safe to enter TMODE 0x3 in such cases. Could you modify the > patch to test whether or not SDA is stuck low before initiating bus > recovery? > What happens if we try to transmit while BB is still 1? If the transmission succeeds, adding such a check in omap_i2c_wait_for_bb should be enough to deal with your case (as the message we then send will contain the stop condition to correct BB). Btw, it also looks like omap_i2c_wait_for_bb currently assumes that if recovery succeeds, the bus will then also be free. I'm not sure if that is actually correct. >>>>> drivers/i2c/busses/i2c-omap.c | 7 ++++++- >>>>> 1 file changed, 6 insertions(+), 1 deletion(-) >>>>> >>>>> diff --git a/drivers/i2c/busses/i2c-omap.c b/drivers/i2c/busses/i2c-omap.c >>>>> index 1ebb5e9..4b25fd1 100644 >>>>> --- a/drivers/i2c/busses/i2c-omap.c >>>>> +++ b/drivers/i2c/busses/i2c-omap.c >>>>> @@ -563,8 +563,13 @@ static int omap_i2c_wait_for_bb_valid(struct omap_i2c_dev *omap) >>>>> } >>>>> >>>>> if (time_after(jiffies, timeout)) { >>>>> + /* >>>>> + * SDA or SCL were low for the entire timeout without >>>>> + * any activity detected. Most likely, a slave is >>>>> + * locking up the bus with no master driving the clock. >>>>> + */ >>>>> dev_warn(omap->dev, "timeout waiting for bus ready\n"); >>>>> - return -ETIMEDOUT; >>>>> + return i2c_recover_bus(&omap->adapter); >>>>> } >>>>> >>>>> msleep(1); >>>>> >>>> >>> >> > -- Regards Claudio