On Thu, Jan 10, 2013 at 04:58:30PM +0000, Chris Wilson wrote: > On Thu, 10 Jan 2013 18:03:00 +0100, Daniel Vetter <daniel.vetter at ffwll.ch> wrote: > > This partially reverts > > > > commit 6c085a728cf000ac1865d66f8c9b52935558b328 > > Author: Chris Wilson <chris at chris-wilson.co.uk> > > Date: Mon Aug 20 11:40:46 2012 +0200 > > > > drm/i915: Track unbound pages > > > > Closer inspection of that patch revealed a bunch of unrelated changes > > in the shrinker: > > - The shrinker count is now in pages instead of objects. > > - For counting the shrinkable objects the old code only looked at the > > inactive list, the new code looks at all bounds objects (including > > pinned ones). That is obviously in addition to the new unbound list. > > - The shrinker cound is no longer scaled with > > sysctl_vfs_cache_pressure. Note though that with the default tuning > > value of vfs_cache_pressue = 100 this doesn't affect the shrinker > > behaviour. > > - When actually shrinking objects, the old code first dropped > > purgeable objects, then normal (inactive) objects. Only then did it, > > in a last-ditch effort idle the gpu and evict everything. The new > > code omits the intermediate step of evicting normal inactive > > objects. > > > > Safe for the first change, which seems benign, and the shrinker count > > scaling, which is a bit a different story, the endresult of all these > > changes is that the shrinker is _much_ more likely to fall back to the > > last-ditch resort of idling the gpu and evicting everything. The old > > code could only do that if something else evicted lots of objects > > meanwhile (since without any other changes the nr_to_scan will be > > smaller than the object count). > > > > Reverting the vfs_cache_pressure behaviour itself is a bit bogus: Only > > dentry/inode object caches should scale their shrinker counts with > > vfs_cache_pressure. Originally I've had that change reverted, too. But > > Chris Wilson insisted that it's too bogus and shouldn't again see the > > light of day. > > > > Hence revert all these other changes and restore the old shrinker > > behaviour, with the minor adjustment that we now first scan the > > unbound list, then the inactive list for each object category > > (purgeable or normal). > > > > A similar patch has been tested by a few people affected by the gen4/5 > > hangs which started to appear in 3.7, which some people bisected to > > the "drm/i915: Track unbound pages" commit. But just disabling the > > unbound logic alone didn't change things at all. > > > > Note that this patch doesn't fix the referenced bugs, it only hides > > the underlying bug(s) well enough to restore pre-3.7 behaviour. The > > key to achieve that is to massively reduce the likelyhood of going > > into a full gpu stall and evicting everything. > > > > v2: Reword commit message a bit, taking Chris Wilson's comment into > > account. > > > > v3: On Chris Wilson's insistency, do not reinstate the rather bogus > > vfs_cache_pressure change. > > > > Tested-by: Greg KH <gregkh at linuxfoundation.org> > > Tested-by: Dave Kleikamp <dave.kleikamp at oracle.com> > > References: https://bugs.freedesktop.org/show_bug.cgi?id=55984 > > References: https://bugs.freedesktop.org/show_bug.cgi?id=57122 > > References: https://bugs.freedesktop.org/show_bug.cgi?id=56916 > > References: https://bugs.freedesktop.org/show_bug.cgi?id=57136 > > Cc: Chris Wilson <chris at chris-wilson.co.uk> > > Cc: stable at vger.kernel.org > > Signed-off-by: Daniel Vetter <daniel.vetter at ffwll.ch> > > Acked-by: Chris Wilson <chris at chris-wilson.co.uk> Picked up for -fixes. > I just hope the clue bat descends soonest before we find another way of > triggering the spurious hangs. Indeed. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch