Re: [PATCH] drm/i915: Switch obj->mm.lock lockdep annotations on its head

"Tang, CQ" <cq.tang@xxxxxxxxx> · Fri, 16 Aug 2019 23:30:03 +0000

> -----Original Message-----
> From: Daniel Vetter [mailto:daniel.vetter@xxxxxxxx]
> Sent: Friday, August 16, 2019 3:03 PM
> To: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
> Cc: Intel Graphics Development <intel-gfx@xxxxxxxxxxxxxxxxxxxxx>; Tang, CQ
> <cq.tang@xxxxxxxxx>; Ursulin, Tvrtko <tvrtko.ursulin@xxxxxxxxx>; Joonas
> Lahtinen <joonas.lahtinen@xxxxxxxxxxxxxxx>; Vetter, Daniel
> <daniel.vetter@xxxxxxxxx>
> Subject: Re: [PATCH] drm/i915: Switch obj->mm.lock lockdep annotations on
> its head
> 
> On Fri, Aug 16, 2019 at 8:46 PM Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
> wrote:
> >
> > Quoting Daniel Vetter (2019-08-16 19:23:36)
> > > The trouble with having a plain nesting flag for locks which do not
> > > naturally nest (unlike block devices and their partitions, which is
> > > the original motivation for nesting levels) is that lockdep will
> > > never spot a true deadlock if you screw up.
> > >
> > > This patch is an attempt at trying better, by highlighting a bit
> > > more the actual nature of the nesting that's going on. Essentially
> > > we have two kinds of objects:
> > >
> > > - objects without pages allocated, which cannot be on any lru and are
> > >   hence inaccessible to the shrinker.
> > >
> > > - objects which have pages allocated, which are on an lru, and which
> > >   the shrinker can decide to throw out.
> > >
> > > For the former type of object, memory allcoations while holding
> > > obj->mm.lock are permissible. For the latter they are not. And
> > > get/put_pages transitions between the two types of objects.
> > >
> > > This is still not entirely fool-proof since the rules might chance.
> > > But as long as we run such a code ever at runtime lockdep should be
> > > able to observe the inconsistency and complain (like with any other
> > > lockdep class that we've split up in multiple classes). But there
> > > are a few clear benefits:
> > >
> > > - We can drop the nesting flag parameter from
> > >   __i915_gem_object_put_pages, because that function by definition is
> > >   never going allocate memory, and calling it on an object which
> > >   doesn't have its pages allocated would be a bug.
> > >
> > > - We strictly catch more bugs, since there's not only one place in the
> > >   entire tree which is annotated with the special class. All the
> > >   other places that had explicit lockdep nesting annotations we're now
> > >   going to leave up to lockdep again.
> > >
> > > - Specifically this catches stuff like calling get_pages from
> > >   put_pages (which isn't really a good idea, if we can call put_pages
> > >   so could the shrinker). I've seen patches do exactly that.
> > >
> > > Of course I fully expect CI will show me for the fool I am with this
> > > one here :-)
> > >
> > > v2: There can only be one (lockdep only has a cache for the first
> > > subclass, not for deeper ones, and we don't want to make these locks
> > > even slower). Still separate enums for better documentation.
> > >
> > > Real fix: don forget about phys objs and pin_map(), and fix the
> > > shrinker to have the right annotations ... silly me.
> > >
> > > v3: Forgot usertptr too ...
> > >
> > > v4: Improve comment for pages_pin_count, drop the IMPORTANT
> comment
> > > and instead prime lockdep (Chris).
> > >
> > > Cc: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
> > > Cc: "Tang, CQ" <cq.tang@xxxxxxxxx>
> > > Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx>
> > > Cc: Joonas Lahtinen <joonas.lahtinen@xxxxxxxxxxxxxxx>
> > > Signed-off-by: Daniel Vetter <daniel.vetter@xxxxxxxxx>
> > > ---
> > >  drivers/gpu/drm/i915/gem/i915_gem_object.c       | 13 ++++++++++++-
> > >  drivers/gpu/drm/i915/gem/i915_gem_object.h       | 16
> +++++++++++++---
> > >  drivers/gpu/drm/i915/gem/i915_gem_object_types.h |  6 +++++-
> > >  drivers/gpu/drm/i915/gem/i915_gem_pages.c        |  9 ++++-----
> > >  drivers/gpu/drm/i915/gem/i915_gem_phys.c         |  2 +-
> > >  drivers/gpu/drm/i915/gem/i915_gem_shrinker.c     |  5 ++---
> > >  drivers/gpu/drm/i915/gem/i915_gem_userptr.c      |  4 ++--
> > >  drivers/gpu/drm/i915/gem/selftests/huge_pages.c  | 12 ++++++------
> > >  8 files changed, 45 insertions(+), 22 deletions(-)
> >
> > static inline int __must_check
> > i915_gem_object_pin_pages(struct drm_i915_gem_object *obj) {
> >         might_lock(&obj->mm.lock);
> >
> >         if (atomic_inc_not_zero(&obj->mm.pages_pin_count))
> >                 return 0;
> >
> >         return __i915_gem_object_get_pages(obj); }
> >
> > is now testing the wrong lock class.
> 
> Unfortunately there's no might_lock_nested.
> 
> But then, this is the best kind of wrong, because of the nesting we have:
> 
> obj->mm.lock#I915_MM_GET_PAGES -> fs_reclaim -> obj->mm.lock
> 
> So the might_lock we have actually checks for way more than just the "more
> correct" annotation. I think I'll just add the above as a comment and leave
> the code as-is. Thoughts?

I believe we should allow recursive call to i915_gem_object_pin_pages(),  if the object is already pinned, the next call just bump up the pin count and return. Otherwise, you only allow paired call:
                  I915_gem_object_pin_pages(obj);
	  I915_gem_object_unpin_pages(obj);

Sometimes we need do this:
	I915_gem_object_pin_pages(obj);
	.....
	I915_gem_object_pin_pages(obj);
	I915_gem_object_unpin_pages(obj);
	.....
	I915_gem_object_unpin_pages(obj);

The nested call is deep in the calling stack.  For example, we pin an object when doing put_pages(),  in put_pages() if we do swapping out, the blitter copying function will pin this object again, even though it is already pinned.

--CQ

> 
> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c
> > > b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> > > index 3929c3a6b281..d01258b175f5 100644
> > > --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> > > @@ -22,6 +22,8 @@
> > >   *
> > >   */
> > >
> > > +#include <linux/sched/mm.h>
> > > +
> > >  #include "display/intel_frontbuffer.h"
> > >  #include "gt/intel_gt.h"
> > >  #include "i915_drv.h"
> > > @@ -61,6 +63,15 @@ void i915_gem_object_init(struct
> > > drm_i915_gem_object *obj,  {
> > >         mutex_init(&obj->mm.lock);
> > >
> > > +       if (IS_ENABLED(CONFIG_LOCKDEP)) {
> > > +               mutex_lock_nested(&obj->mm.lock, I915_MM_GET_PAGES);
> > > +               fs_reclaim_acquire(GFP_KERNEL);
> > > +               might_lock(&obj->mm.lock);
> > > +               fs_reclaim_release(GFP_KERNEL);
> > > +               mutex_unlock(&obj->mm.lock);
> > > +       }
> >
> > This is very powerful and sells a lot of churn.
> 
> Yeah that was the idea here. Plus I hope it's the easier to understand the
> annotations and lock nesting rules for obj->mm.lock this way - I freaked out
> quite a bit about the current one until you convinced me (which took it's
> sweet time) that it's all fine. Maybe explicitly annotating get_pages and it's
> special rule will help others (I can't play guinea pig twice unfortunately, so we
> can't test that theory).
> -Daniel
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx