On Thu, 10 Jan 2013 18:03:00 +0100, Daniel Vetter <daniel.vetter at ffwll.ch> wrote: > This partially reverts > > commit 6c085a728cf000ac1865d66f8c9b52935558b328 > Author: Chris Wilson <chris at chris-wilson.co.uk> > Date: Mon Aug 20 11:40:46 2012 +0200 > > drm/i915: Track unbound pages > > Closer inspection of that patch revealed a bunch of unrelated changes > in the shrinker: > - The shrinker count is now in pages instead of objects. > - For counting the shrinkable objects the old code only looked at the > inactive list, the new code looks at all bounds objects (including > pinned ones). That is obviously in addition to the new unbound list. > - The shrinker cound is no longer scaled with > sysctl_vfs_cache_pressure. Note though that with the default tuning > value of vfs_cache_pressue = 100 this doesn't affect the shrinker > behaviour. > - When actually shrinking objects, the old code first dropped > purgeable objects, then normal (inactive) objects. Only then did it, > in a last-ditch effort idle the gpu and evict everything. The new > code omits the intermediate step of evicting normal inactive > objects. > > Safe for the first change, which seems benign, and the shrinker count > scaling, which is a bit a different story, the endresult of all these > changes is that the shrinker is _much_ more likely to fall back to the > last-ditch resort of idling the gpu and evicting everything. The old > code could only do that if something else evicted lots of objects > meanwhile (since without any other changes the nr_to_scan will be > smaller than the object count). > > Reverting the vfs_cache_pressure behaviour itself is a bit bogus: Only > dentry/inode object caches should scale their shrinker counts with > vfs_cache_pressure. Originally I've had that change reverted, too. But > Chris Wilson insisted that it's too bogus and shouldn't again see the > light of day. > > Hence revert all these other changes and restore the old shrinker > behaviour, with the minor adjustment that we now first scan the > unbound list, then the inactive list for each object category > (purgeable or normal). > > A similar patch has been tested by a few people affected by the gen4/5 > hangs which started to appear in 3.7, which some people bisected to > the "drm/i915: Track unbound pages" commit. But just disabling the > unbound logic alone didn't change things at all. > > Note that this patch doesn't fix the referenced bugs, it only hides > the underlying bug(s) well enough to restore pre-3.7 behaviour. The > key to achieve that is to massively reduce the likelyhood of going > into a full gpu stall and evicting everything. > > v2: Reword commit message a bit, taking Chris Wilson's comment into > account. > > v3: On Chris Wilson's insistency, do not reinstate the rather bogus > vfs_cache_pressure change. > > Tested-by: Greg KH <gregkh at linuxfoundation.org> > Tested-by: Dave Kleikamp <dave.kleikamp at oracle.com> > References: https://bugs.freedesktop.org/show_bug.cgi?id=55984 > References: https://bugs.freedesktop.org/show_bug.cgi?id=57122 > References: https://bugs.freedesktop.org/show_bug.cgi?id=56916 > References: https://bugs.freedesktop.org/show_bug.cgi?id=57136 > Cc: Chris Wilson <chris at chris-wilson.co.uk> > Cc: stable at vger.kernel.org > Signed-off-by: Daniel Vetter <daniel.vetter at ffwll.ch> Acked-by: Chris Wilson <chris at chris-wilson.co.uk> I just hope the clue bat descends soonest before we find another way of triggering the spurious hangs. -Chris -- Chris Wilson, Intel Open Source Technology Centre