On Sun, May 20, 2012 at 11:07:46AM -0700, Daniel Kurtz wrote: > On Sun, May 20, 2012 at 8:19 AM, Daniel Vetter <daniel at ffwll.ch> wrote: > > > > On Sat, May 19, 2012 at 10:10:12PM +0200, Daniel Vetter wrote: > > > ... too much risk for flaky edid transfers. > > > > > > This regression has been introduced in > > > > > > commit e646d5773572bf52017983d758bdf05777dc5600 > > > Author: Daniel Kurtz <djkurtz at chromium.org> > > > Date: ? Fri Mar 30 19:46:38 2012 +0800 > > > > > > ? ? drm/i915/intel_i2c: always wait for IDLE before clearing NAK > > > > > > This patch keeps the improved NAK handling on the hw side, but reverts > > > the change to return -ENXIO in case the gmbus controller reports a > > > NAK. > > > > > > Cc: Daniel Kurtz <djkurtz at chromium.org> > > > > Hi Daniel, > > > > Can you please take a look at this one and smash your r-b onto it if you > > agree? > > > > Thanks, Daniel > > > > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=49518 > > > Reported-and-Tested-by: Julian Simioni <julian.simioni at gmail.com> > > > Signed-Off-by: Daniel Vetter <daniel.vetter at ffwll.ch> > > > --- > > > ?drivers/gpu/drm/i915/intel_i2c.c | ? ?7 ++++--- > > > ?1 file changed, 4 insertions(+), 3 deletions(-) > > > > > > diff --git a/drivers/gpu/drm/i915/intel_i2c.c b/drivers/gpu/drm/i915/intel_i2c.c > > > index e04255e..0588d8e 100644 > > > --- a/drivers/gpu/drm/i915/intel_i2c.c > > > +++ b/drivers/gpu/drm/i915/intel_i2c.c > > > @@ -418,10 +418,11 @@ clear_err: > > > ? ? ? ?* If no ACK is received during the address phase of a transaction, > > > ? ? ? ?* the adapter must report -ENXIO. > > > ? ? ? ?* It is not clear what to return if no ACK is received at other times. > > > - ? ? ?* So, we always return -ENXIO in all NAK cases, to ensure we send > > > - ? ? ?* it at least during the one case that is specified. > > > + ? ? ?* > > > + ? ? ?* Unfortunately we can't afford false positives in returning -ENXIO, > > > + ? ? ?* hence never return -ENXIO. > > > ? ? ? ?*/ > > > - ? ? ret = -ENXIO; > > > + ? ? ret = i; > > > The bugzilla report shows that in the old case, the gmbus failure was: > [ 2.528812] vgaarb: device changed decodes: > PCI:0000:00:02.0,olddecodes=io+mem,decodes=io+mem:owns=io+mem > [ 2.540753] [drm] GMBUS [i915 gmbus panel] timed out waiting for idle > [ 2.580813] [drm:intel_panel_get_backlight], get backlight PWM = 0 > > And now the failure is: > [ 2.523015] vgaarb: device changed decodes: > PCI:0000:00:02.0,olddecodes=io+mem,decodes=io+mem:owns=io+mem > [ 2.534770] [drm] GMBUS [i915 gmbus panel] timed out after NAK > [ 2.534797] [drm:gmbus_xfer], GMBUS [i915 gmbus panel] NAK for > addr: 0050 w(1) > [ 2.534883] [drm:intel_panel_get_backlight], get backlight PWM = 0 > > From these logs, it looks like in both cases, there is an i2c > communication error between host and panel. The difference is that: > * in the old case, the return value was 0, which triggered a silent > retry (the 40ms gap between GMBUS and PWM messages) > * but, now that the return value is -ENXIO, the caller does not retry. > > Actually, in the 'new' case, there are two errors happening: a NAK and > then a timeout after the NAK, while waiting for the controller to > clear the ACTIVE bit. I'm not actually sure what causes "timeout > after NAK", but I think it means the controller is waiting for entire > transaction to complete (perhaps the STOP bit after NAK?). > > Since this reported issue is happening in this double error path, I'd > rather a patch that fixes it without disabling the more generic NAK > path. Maybe something like this: Hm, good suggestion. This way, if we get a NAK but no issues later on, we'll still fail faster if gmbus detected that no device is responding. Which should speed up boot a bit. I'll run this past the bug reporter. Thanks, Daniel > > clear_err: > /* > * Wait for bus to IDLE before clearing NAK. > * If we clear the NAK while bus is still active, then it will stay > * active and the next transaction may fail. > */ > + ret = -ENXIO; > if (wait_for((I915_READ(GMBUS2 + reg_offset) & GMBUS_ACTIVE) == 0, > - 10)) > + 10)) { > DRM_DEBUG_KMS("GMBUS [%s] timed out after NAK\n", > adapter->name); > + ret = -ETIMEDOUT; // Or 0 ? > + } > > /* Toggle the Software Clear Interrupt bit. This has the effect > * of resetting the GMBUS controller and so clearing the > * BUS_ERROR raised by the slave's NAK. > */ > I915_WRITE(GMBUS1 + reg_offset, GMBUS_SW_CLR_INT); > I915_WRITE(GMBUS1 + reg_offset, 0); > I915_WRITE(GMBUS0 + reg_offset, 0); > > DRM_DEBUG_KMS("GMBUS [%s] NAK for addr: %04x %c(%d)\n", > adapter->name, msgs[i].addr, > (msgs[i].flags & I2C_M_RD) ? 'r' : 'w', msgs[i].len); > > /* > * If no ACK is received during the address phase of a transaction, > * the adapter must report -ENXIO. > * It is not clear what to return if no ACK is received at other times. > - * So, we always return -ENXIO in all NAK cases, to ensure we send > - * it at least during the one case that is specified. > + * So, return -ENXIO for NAK after any byte, unless there was a timeout > + * while waiting for IDLE after NAK. > */ > - ret = -ENXIO; > goto out; > > -Daniel > > > > > > ? ? ? goto out; > > > > > > ?timeout: > > > -- > > > 1.7.10 > > > > > > > -- > > Daniel Vetter > > Mail: daniel at ffwll.ch > > Mobile: +41 (0)79 365 57 48 -- Daniel Vetter Mail: daniel at ffwll.ch Mobile: +41 (0)79 365 57 48