> -----Original Message----- > From: Daniel Vetter [mailto:daniel.vetter@xxxxxxxx] > Sent: Friday, August 16, 2019 3:03 PM > To: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > Cc: Intel Graphics Development <intel-gfx@xxxxxxxxxxxxxxxxxxxxx>; Tang, CQ > <cq.tang@xxxxxxxxx>; Ursulin, Tvrtko <tvrtko.ursulin@xxxxxxxxx>; Joonas > Lahtinen <joonas.lahtinen@xxxxxxxxxxxxxxx>; Vetter, Daniel > <daniel.vetter@xxxxxxxxx> > Subject: Re: [PATCH] drm/i915: Switch obj->mm.lock lockdep annotations on > its head > > On Fri, Aug 16, 2019 at 8:46 PM Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > wrote: > > > > Quoting Daniel Vetter (2019-08-16 19:23:36) > > > The trouble with having a plain nesting flag for locks which do not > > > naturally nest (unlike block devices and their partitions, which is > > > the original motivation for nesting levels) is that lockdep will > > > never spot a true deadlock if you screw up. > > > > > > This patch is an attempt at trying better, by highlighting a bit > > > more the actual nature of the nesting that's going on. Essentially > > > we have two kinds of objects: > > > > > > - objects without pages allocated, which cannot be on any lru and are > > > hence inaccessible to the shrinker. > > > > > > - objects which have pages allocated, which are on an lru, and which > > > the shrinker can decide to throw out. > > > > > > For the former type of object, memory allcoations while holding > > > obj->mm.lock are permissible. For the latter they are not. And > > > get/put_pages transitions between the two types of objects. > > > > > > This is still not entirely fool-proof since the rules might chance. > > > But as long as we run such a code ever at runtime lockdep should be > > > able to observe the inconsistency and complain (like with any other > > > lockdep class that we've split up in multiple classes). But there > > > are a few clear benefits: > > > > > > - We can drop the nesting flag parameter from > > > __i915_gem_object_put_pages, because that function by definition is > > > never going allocate memory, and calling it on an object which > > > doesn't have its pages allocated would be a bug. > > > > > > - We strictly catch more bugs, since there's not only one place in the > > > entire tree which is annotated with the special class. All the > > > other places that had explicit lockdep nesting annotations we're now > > > going to leave up to lockdep again. > > > > > > - Specifically this catches stuff like calling get_pages from > > > put_pages (which isn't really a good idea, if we can call put_pages > > > so could the shrinker). I've seen patches do exactly that. > > > > > > Of course I fully expect CI will show me for the fool I am with this > > > one here :-) > > > > > > v2: There can only be one (lockdep only has a cache for the first > > > subclass, not for deeper ones, and we don't want to make these locks > > > even slower). Still separate enums for better documentation. > > > > > > Real fix: don forget about phys objs and pin_map(), and fix the > > > shrinker to have the right annotations ... silly me. > > > > > > v3: Forgot usertptr too ... > > > > > > v4: Improve comment for pages_pin_count, drop the IMPORTANT > comment > > > and instead prime lockdep (Chris). > > > > > > Cc: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > > > Cc: "Tang, CQ" <cq.tang@xxxxxxxxx> > > > Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> > > > Cc: Joonas Lahtinen <joonas.lahtinen@xxxxxxxxxxxxxxx> > > > Signed-off-by: Daniel Vetter <daniel.vetter@xxxxxxxxx> > > > --- > > > drivers/gpu/drm/i915/gem/i915_gem_object.c | 13 ++++++++++++- > > > drivers/gpu/drm/i915/gem/i915_gem_object.h | 16 > +++++++++++++--- > > > drivers/gpu/drm/i915/gem/i915_gem_object_types.h | 6 +++++- > > > drivers/gpu/drm/i915/gem/i915_gem_pages.c | 9 ++++----- > > > drivers/gpu/drm/i915/gem/i915_gem_phys.c | 2 +- > > > drivers/gpu/drm/i915/gem/i915_gem_shrinker.c | 5 ++--- > > > drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 4 ++-- > > > drivers/gpu/drm/i915/gem/selftests/huge_pages.c | 12 ++++++------ > > > 8 files changed, 45 insertions(+), 22 deletions(-) > > > > static inline int __must_check > > i915_gem_object_pin_pages(struct drm_i915_gem_object *obj) { > > might_lock(&obj->mm.lock); > > > > if (atomic_inc_not_zero(&obj->mm.pages_pin_count)) > > return 0; > > > > return __i915_gem_object_get_pages(obj); } > > > > is now testing the wrong lock class. > > Unfortunately there's no might_lock_nested. > > But then, this is the best kind of wrong, because of the nesting we have: > > obj->mm.lock#I915_MM_GET_PAGES -> fs_reclaim -> obj->mm.lock > > So the might_lock we have actually checks for way more than just the "more > correct" annotation. I think I'll just add the above as a comment and leave > the code as-is. Thoughts? I believe we should allow recursive call to i915_gem_object_pin_pages(), if the object is already pinned, the next call just bump up the pin count and return. Otherwise, you only allow paired call: I915_gem_object_pin_pages(obj); I915_gem_object_unpin_pages(obj); Sometimes we need do this: I915_gem_object_pin_pages(obj); ..... I915_gem_object_pin_pages(obj); I915_gem_object_unpin_pages(obj); ..... I915_gem_object_unpin_pages(obj); The nested call is deep in the calling stack. For example, we pin an object when doing put_pages(), in put_pages() if we do swapping out, the blitter copying function will pin this object again, even though it is already pinned. --CQ > > > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c > > > b/drivers/gpu/drm/i915/gem/i915_gem_object.c > > > index 3929c3a6b281..d01258b175f5 100644 > > > --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c > > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c > > > @@ -22,6 +22,8 @@ > > > * > > > */ > > > > > > +#include <linux/sched/mm.h> > > > + > > > #include "display/intel_frontbuffer.h" > > > #include "gt/intel_gt.h" > > > #include "i915_drv.h" > > > @@ -61,6 +63,15 @@ void i915_gem_object_init(struct > > > drm_i915_gem_object *obj, { > > > mutex_init(&obj->mm.lock); > > > > > > + if (IS_ENABLED(CONFIG_LOCKDEP)) { > > > + mutex_lock_nested(&obj->mm.lock, I915_MM_GET_PAGES); > > > + fs_reclaim_acquire(GFP_KERNEL); > > > + might_lock(&obj->mm.lock); > > > + fs_reclaim_release(GFP_KERNEL); > > > + mutex_unlock(&obj->mm.lock); > > > + } > > > > This is very powerful and sells a lot of churn. > > Yeah that was the idea here. Plus I hope it's the easier to understand the > annotations and lock nesting rules for obj->mm.lock this way - I freaked out > quite a bit about the current one until you convinced me (which took it's > sweet time) that it's all fine. Maybe explicitly annotating get_pages and it's > special rule will help others (I can't play guinea pig twice unfortunately, so we > can't test that theory). > -Daniel > -- > Daniel Vetter > Software Engineer, Intel Corporation > +41 (0) 79 365 57 48 - http://blog.ffwll.ch _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx