On Thu, 9 Dec 2021 at 13:46, Maarten Lankhorst <maarten.lankhorst@xxxxxxxxxxxxxxx> wrote: > > On 09-12-2021 14:40, Matthew Auld wrote: > > On Thu, 9 Dec 2021 at 13:25, Maarten Lankhorst > > <maarten.lankhorst@xxxxxxxxxxxxxxx> wrote: > >> On 09-12-2021 14:05, Matthew Auld wrote: > >>> On Mon, 29 Nov 2021 at 13:58, Maarten Lankhorst > >>> <maarten.lankhorst@xxxxxxxxxxxxxxx> wrote: > >>>> We want to remove more members of i915_vma, which requires the locking to be > >>>> held more often. > >>>> > >>>> Start requiring gem object lock for i915_vma_unbind, as it's one of the > >>>> callers that may unpin pages. > >>>> > >>>> Some special care is needed when evicting, because the last reference to the > >>>> object may be held by the VMA, so after __i915_vma_unbind, vma may be garbage, > >>>> and we need to cache vma->obj before unlocking. > >>>> > >>>> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@xxxxxxxxxxxxxxx> > >>>> --- > >>> <snip> > >>> > >>>> @@ -129,22 +129,47 @@ void i915_ggtt_suspend_vm(struct i915_address_space *vm) > >>>> > >>>> drm_WARN_ON(&vm->i915->drm, !vm->is_ggtt && !vm->is_dpt); > >>>> > >>>> +retry: > >>>> + i915_gem_drain_freed_objects(vm->i915); > >>>> + > >>>> mutex_lock(&vm->mutex); > >>>> > >>>> /* Skip rewriting PTE on VMA unbind. */ > >>>> open = atomic_xchg(&vm->open, 0); > >>>> > >>>> list_for_each_entry_safe(vma, vn, &vm->bound_list, vm_link) { > >>>> + struct drm_i915_gem_object *obj = vma->obj; > >>>> + > >>>> GEM_BUG_ON(!drm_mm_node_allocated(&vma->node)); > >>>> + > >>>> i915_vma_wait_for_bind(vma); > >>>> > >>>> - if (i915_vma_is_pinned(vma)) > >>>> + if (i915_vma_is_pinned(vma) || !i915_vma_is_bound(vma, I915_VMA_GLOBAL_BIND)) > >>>> continue; > >>>> > >>>> - if (!i915_vma_is_bound(vma, I915_VMA_GLOBAL_BIND)) { > >>>> - __i915_vma_evict(vma); > >>>> - drm_mm_remove_node(&vma->node); > >>>> + /* unlikely to race when GPU is idle, so no worry about slowpath.. */ > >>>> + if (!i915_gem_object_trylock(obj, NULL)) { > >>>> + atomic_set(&vm->open, open); > >>> Does this need a comment about barriers? > >> Not sure, it's guarded by vm->mutex. > >>>> + > >>>> + i915_gem_object_get(obj); > >>> Should this not be kref_get_unless_zero? Assuming the vm->mutex is the > >>> only thing keeping the object alive here, won't this lead to potential > >>> uaf/double-free or something? Also should we not plonk this before the > >>> trylock? Or maybe I'm missing something here? > >> Normally you're correct, this is normally the case, but we drain freed objects and this path should only be run during s/r, at which point userspace should be dead, GPU idle, and we just drained all freed objects above. > >> > >> It would be a bug if we still found a dead object, as nothing should be running. > > Hmm, Ok. So why do we expect the trylock to ever fail here? Who else > > can grab it at this stage? > It probably shouldn't, should probably be a WARN if it happens. r-b with that then. > >>>> + mutex_unlock(&vm->mutex); > >>>> + > >>>> + i915_gem_object_lock(obj, NULL); > >>>> + open = i915_vma_unbind(vma); > >>>> + i915_gem_object_unlock(obj); > >>>> + > >>>> + GEM_WARN_ON(open); > >>>> + > >>>> + i915_gem_object_put(obj); > >>>> + goto retry; > >>>> } > >>>> + > >>>> + i915_vma_wait_for_bind(vma); > >>> We also call wait_for_bind above, is that intentional? > >> Should be harmless, but first one should probably be removed. :) > >> >