On Thu, 26 Jul 2012 16:24:50 +0200, Daniel Vetter <daniel.vetter at ffwll.ch> wrote: > ... by adding seemingly redudant posting reads. > > This little dragon lair exploded the first time around when we've > refactored the code a bit to use the common wait_for_atomic_us in > "drm/i915: Group the GT routines together in both code and vtable", > which caused QA to file fdo bug #51738. > > Chris Wilson entertained a few approaches to fixing #51738: Replacing > the udelay(1) with the previously-used udelay(10) (or any other > "sufficiently larger" delay), adding a posting read, or ditching the > delay completely and using cpu_relax. We went with the cpu_relax and > "915: Workaround hang with BSD and forcewake on SandyBridge". Which > blew up in fdo bug #52424, but adding the posting read while still > using cpu_relax seems to also fix that, it looks like the > posting read is the important ingriedient to fix these rc6 related > hangs on snb. > > Popular theories as to why this is like it is include: > - A herd of pink elephants got royally angered somehow. > > - The gpu has internally different functional units and judging by the > register offsets, the forcewake request register and the forcewake > ack registers are _not_ in the same functional unit (or at least > aren't reached through the same routes). Hence the posting read > syncs up with the wrong block and gets the entire gpu confused. > > - ... > > As a minimal ducttape fix for 3.6, let's just put these posting reads > into place again. We can try fancier approaches (like adding back the > cpu_relax instead of the udelay) in -next. > > This (re-)fixes a regression introduced in > > commit 990bbdadabaa51828e475eda86ee5720a4910cc3 > Author: Chris Wilson <chris at chris-wilson.co.uk> > Date: Mon Jul 2 11:51:02 2012 -0300 > > drm/i915: Group the GT routines together in both code and vtable > > Cc: Chris Wilson <chris at chris-wilson.co.uk> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=52424 > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=51738u > Signed-off-by: Daniel Vetter <daniel.vetter at ffwll.ch> No change on IVB, fixes the dummy_reloc_loop hang on SNB. Tested-by: Chris Wilson <chris at chris-wilson.co.uk> udelay() is winning the award for most popular function. Almost, it got pipped at the last second by read_hpet() on earlier chipsets. -Chris -- Chris Wilson, Intel Open Source Technology Centre