On Fri, 16 Jul 2021 at 16:52, Matthew Auld <matthew.william.auld@xxxxxxxxx> wrote: > > On Fri, 16 Jul 2021 at 15:10, Jason Ekstrand <jason@xxxxxxxxxxxxxx> wrote: > > > > On Fri, Jul 16, 2021 at 8:54 AM Matthew Auld > > <matthew.william.auld@xxxxxxxxx> wrote: > > > > > > On Thu, 15 Jul 2021 at 23:39, Jason Ekstrand <jason@xxxxxxxxxxxxxx> wrote: > > > > > > > > Whenever we had a user object (n_placements > 0), we were ignoring > > > > obj->mm.region and always putting obj->placements[0] as the requested > > > > region. For LMEM+SMEM objects, this was causing them to get shoved into > > > > LMEM on every i915_ttm_get_pages() even when SMEM was requested by, say, > > > > i915_gem_object_migrate(). > > > > > > i915_ttm_migrate calls i915_ttm_place_from_region() directly with the > > > requested region, so there shouldn't be an issue with migration right? > > > Do you have some more details? > > > > With i915_ttm_migrate directly, no. But, in the last patch in the > > series, we're trying to migrate LMEM+SMEM buffers into SMEM on > > attach() and pin it there. This blows up in a very unexpected (IMO) > > way. The flow goes something like this: > > > > - Client attempts a dma-buf import from another device > > - In attach() we call i915_gem_object_migrate() which calls > > i915_ttm_migrate() which migrates as requested. > > - Once the migration is complete, we call i915_gem_object_pin_pages() > > which calls i915_ttm_get_pages() which depends on > > i915_ttm_placement_from_obj() and so migrates it right back to LMEM. > > The mm.pages must be NULL here, otherwise it would just increment the > pages_pin_count? > > > > > Maybe the problem here is actually that our TTM code isn't respecting > > obj->mm.pages_pin_count? > > I think if the resource is moved, we always nuke the mm.pages after > being notified of the move. Also TTM is also not allowed to move > pinned buffers. > > I guess if we are evicted/swapped, so assuming we are not holding the > object lock, and it's not pinned, the future call to get_pages() will > see mm.pages = NULL, even though the ttm_resource is still there, and > because we prioritise the placements[0], instead of mm.region we end > up moving it for no good reason. But in your case you are holding the > lock, or it's pinned? Also is this just with the selftest, or > something real? Or at least in the selftest I see ____i915_gem_object_get_pages() which doesn't even consider the mm.pages AFAIK. > > > > > In case you can't tell, I really have no clue what I'm doing here. > > I'm really stumbling around in the dark finding things that make my > > bug go away. I'm happy for the feedback. > > > > --Jason > > > > > > > > > > > > > Signed-off-by: Jason Ekstrand <jason@xxxxxxxxxxxxxx> > > > > Cc: Thomas Hellström <thomas.hellstrom@xxxxxxxxxxxxxxx> > > > > Cc: Matthew Auld <matthew.auld@xxxxxxxxx> > > > > Cc: Maarten Lankhorst <maarten.lankhorst@xxxxxxxxxxxxxxx> > > > > --- > > > > drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 3 +-- > > > > 1 file changed, 1 insertion(+), 2 deletions(-) > > > > > > > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c > > > > index d30f274c329c7..5985e994d56cf 100644 > > > > --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c > > > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c > > > > @@ -150,8 +150,7 @@ i915_ttm_placement_from_obj(const struct drm_i915_gem_object *obj, > > > > unsigned int i; > > > > > > > > placement->num_placement = 1; > > > > - i915_ttm_place_from_region(num_allowed ? obj->mm.placements[0] : > > > > - obj->mm.region, requested, flags); > > > > + i915_ttm_place_from_region(obj->mm.region, requested, flags); > > > > > > > > /* Cache this on object? */ > > > > placement->num_busy_placement = num_allowed; > > > > -- > > > > 2.31.1 > > > >