Re: [PATCH v2 12/16] drm/i915: Add i915_vma_unbind_unlocked, and take obj lock for i915_vma_unbind

Maarten Lankhorst <maarten.lankhorst@xxxxxxxxxxxxxxx> · Thu, 9 Dec 2021 14:45:53 +0100



On 09-12-2021 14:40, Matthew Auld wrote:
> On Thu, 9 Dec 2021 at 13:25, Maarten Lankhorst
> <maarten.lankhorst@xxxxxxxxxxxxxxx> wrote:
>> On 09-12-2021 14:05, Matthew Auld wrote:
>>> On Mon, 29 Nov 2021 at 13:58, Maarten Lankhorst
>>> <maarten.lankhorst@xxxxxxxxxxxxxxx> wrote:
>>>> We want to remove more members of i915_vma, which requires the locking to be
>>>> held more often.
>>>>
>>>> Start requiring gem object lock for i915_vma_unbind, as it's one of the
>>>> callers that may unpin pages.
>>>>
>>>> Some special care is needed when evicting, because the last reference to the
>>>> object may be held by the VMA, so after __i915_vma_unbind, vma may be garbage,
>>>> and we need to cache vma->obj before unlocking.
>>>>
>>>> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@xxxxxxxxxxxxxxx>
>>>> ---
>>> <snip>
>>>
>>>> @@ -129,22 +129,47 @@ void i915_ggtt_suspend_vm(struct i915_address_space *vm)
>>>>
>>>>         drm_WARN_ON(&vm->i915->drm, !vm->is_ggtt && !vm->is_dpt);
>>>>
>>>> +retry:
>>>> +       i915_gem_drain_freed_objects(vm->i915);
>>>> +
>>>>         mutex_lock(&vm->mutex);
>>>>
>>>>         /* Skip rewriting PTE on VMA unbind. */
>>>>         open = atomic_xchg(&vm->open, 0);
>>>>
>>>>         list_for_each_entry_safe(vma, vn, &vm->bound_list, vm_link) {
>>>> +               struct drm_i915_gem_object *obj = vma->obj;
>>>> +
>>>>                 GEM_BUG_ON(!drm_mm_node_allocated(&vma->node));
>>>> +
>>>>                 i915_vma_wait_for_bind(vma);
>>>>
>>>> -               if (i915_vma_is_pinned(vma))
>>>> +               if (i915_vma_is_pinned(vma) || !i915_vma_is_bound(vma, I915_VMA_GLOBAL_BIND))
>>>>                         continue;
>>>>
>>>> -               if (!i915_vma_is_bound(vma, I915_VMA_GLOBAL_BIND)) {
>>>> -                       __i915_vma_evict(vma);
>>>> -                       drm_mm_remove_node(&vma->node);
>>>> +               /* unlikely to race when GPU is idle, so no worry about slowpath.. */
>>>> +               if (!i915_gem_object_trylock(obj, NULL)) {
>>>> +                       atomic_set(&vm->open, open);
>>> Does this need a comment about barriers?
>> Not sure, it's guarded by vm->mutex.
>>>> +
>>>> +                       i915_gem_object_get(obj);
>>> Should this not be kref_get_unless_zero? Assuming the vm->mutex is the
>>> only thing keeping the object alive here, won't this lead to potential
>>> uaf/double-free or something? Also should we not plonk this before the
>>> trylock? Or maybe I'm missing something here?
>> Normally you're correct, this is normally the case, but we drain freed objects and this path should only be run during s/r, at which point userspace should be dead, GPU idle, and we just drained all freed objects above.
>>
>> It would be a bug if we still found a dead object, as nothing should be running.
> Hmm, Ok. So why do we expect the trylock to ever fail here? Who else
> can grab it at this stage?
It probably shouldn't, should probably be a WARN if it happens.
>>>> +                       mutex_unlock(&vm->mutex);
>>>> +
>>>> +                       i915_gem_object_lock(obj, NULL);
>>>> +                       open = i915_vma_unbind(vma);
>>>> +                       i915_gem_object_unlock(obj);
>>>> +
>>>> +                       GEM_WARN_ON(open);
>>>> +
>>>> +                       i915_gem_object_put(obj);
>>>> +                       goto retry;
>>>>                 }
>>>> +
>>>> +               i915_vma_wait_for_bind(vma);
>>> We also call wait_for_bind above, is that intentional?
>> Should be harmless, but first one should probably be removed. :)
>>