Hi, Boris, On Tue, 2023-10-03 at 12:05 +0200, Boris Brezillon wrote: > Hello Thomas, > > On Tue, 3 Oct 2023 10:36:10 +0200 > Thomas Hellström <thomas.hellstrom@xxxxxxxxxxxxxxx> wrote: > > > > +/** > > > + * get_next_vm_bo_from_list() - get the next vm_bo element > > > + * @__gpuvm: The GPU VM > > > + * @__list_name: The name of the list we're iterating on > > > + * @__local_list: A pointer to the local list used to store > > > already iterated items > > > + * @__prev_vm_bo: The previous element we got from > > > drm_gpuvm_get_next_cached_vm_bo() > > > + * > > > + * This helper is here to provide lockless list iteration. > > > Lockless as in, the > > > + * iterator releases the lock immediately after picking the > > > first element from > > > + * the list, so list insertion deletion can happen concurrently. > > > + * > > > + * Elements popped from the original list are kept in a local > > > list, so removal > > > + * and is_empty checks can still happen while we're iterating > > > the list. > > > + */ > > > +#define get_next_vm_bo_from_list(__gpuvm, __list_name, > > > __local_list, __prev_vm_bo) \ > > > + ({ > > > \ > > > + struct drm_gpuvm_bo *__vm_bo = > > > NULL; \ > > > + > > > \ > > > + drm_gpuvm_bo_put(__prev_vm_bo); > > > \ > > > + > > > \ > > > + spin_lock(&(__gpuvm)- > > > >__list_name.lock); \ > > > > Here we unconditionally take the spinlocks while iterating, and the > > main > > point of DRM_GPUVM_RESV_PROTECTED was really to avoid that? > > > > > > > + if (!(__gpuvm)- > > > >__list_name.local_list) \ > > > + (__gpuvm)->__list_name.local_list = > > > __local_list; \ > > > + else > > > \ > > > + WARN_ON((__gpuvm)->__list_name.local_list > > > != __local_list); \ > > > + > > > \ > > > + while (!list_empty(&(__gpuvm)->__list_name.list)) > > > { \ > > > + __vm_bo = list_first_entry(&(__gpuvm)- > > > >__list_name.list, \ > > > + struct > > > drm_gpuvm_bo, \ > > > + > > > list.entry.__list_name); \ > > > + if (kref_get_unless_zero(&__vm_bo->kref)) > > > { > > And unnecessarily grab a reference in the RESV_PROTECTED case. > > > \ > > > + list_move_tail(&(__vm_bo)- > > > >list.entry.__list_name, \ > > > + > > > __local_list); \ > > > + break; > > > \ > > > + } else > > > { \ > > > + list_del_init(&(__vm_bo)- > > > >list.entry.__list_name); \ > > > + __vm_bo = > > > NULL; \ > > > + } > > > \ > > > + } > > > \ > > > + spin_unlock(&(__gpuvm)- > > > >__list_name.lock); \ > > > + > > > \ > > > + __vm_bo; > > > \ > > > + }) > > > > IMHO this lockless list iteration looks very complex and should be > > pretty difficult to maintain while moving forward, also since it > > pulls > > the gpuvm_bos off the list, list iteration needs to be protected by > > an > > outer lock anyway. > > As being partly responsible for this convoluted list iterator, I must > say I agree with you. There's so many ways this can go wrong if the > user doesn't call it the right way, or doesn't protect concurrent > list > iterations with a separate lock (luckily, this is a private > iterator). I > mean, it works, so there's certainly a way to get it right, but gosh, > this is so far from the simple API I had hoped for. > > > Also from what I understand from Boris, the extobj > > list would typically not need the fine-grained locking; only the > > evict > > list? > > Right, I'm adding the gpuvm_bo to extobj list in the ioctl path, when > the GEM and VM resvs are held, and I'm deferring the > drm_gpuvm_bo_put() > call to a work that's not in the dma-signalling path. This being > said, > I'm still not comfortable with the > > gem = drm_gem_object_get(vm_bo->gem); > dma_resv_lock(gem->resv); > drm_gpuvm_bo_put(vm_bo); > dma_resv_unlock(gem->resv); > drm_gem_object_put(gem); > > dance that's needed to avoid a UAF when the gpuvm_bo is the last GEM > owner, not to mention that drm_gpuva_unlink() calls > drm_gpuvm_bo_put() > after making sure the GEM gpuvm_list lock is held, but this lock > might > differ from the resv lock (custom locking so we can call > gpuvm_unlink() in the dma-signalling path). So we now have paths > where > drm_gpuvm_bo_put() are called with the resv lock held, and others > where > they are not, and that only works because we're relying on the the > fact > those drm_gpuvm_bo_put() calls won't make the refcount drop to zero, > because the deferred vm_bo_put() work still owns a vm_bo ref. I'm not sure I follow to 100% here, but in the code snippet above it's pretty clear to me that it needs to hold an explicit gem object reference when calling dma_resv_unlock(gem->resv). Each time you copy a referenced pointer (here from vm_bo->gem to gem) you need to up the refcount unless you make sure (by locks or other means) that the source of the copy has a strong refcount and stays alive, so that's no weird action to me. Could possibly add a drm_gpuvm_bo_get_gem() to access the gem member (and that also takes a refcount) for driver users to avoid the potential pitfall. > > All these tiny details add to the overall complexity of this common > layer, and to me, that's not any better than the > get_next_vm_bo_from_list() complexity you were complaining about > (might > be even worth, because this sort of things leak to users). > > Having an internal lock partly solves that, in that the locking of > the > extobj list is now entirely orthogonal to the GEM that's being > removed > from this list, and we can lock/unlock internally without forcing the > caller to take weird actions to make sure things don't explode. Don't > get me wrong, I get that this locking overhead is not acceptable for > Xe, but I feel like we're turning drm_gpuvm into a white elephant > that > only few people will get right. I tend to agree, but to me the big complication comes from the async (dma signalling path) state updates. Let's say for example we have a lower level lock for the gem object's gpuvm_bo list. Some drivers grab it from the dma fence signalling path, other drivers need to access all vm's of a bo to grab their dma_resv locks using a WW transaction. There will be problems, although probably solveable. > > This is just my personal view on this, and I certainly don't want to > block or delay the merging of this patchset, but I thought I'd share > my > concerns. As someone who's been following the evolution of this > drm_gpuva/vm series for weeks, and who's still sometimes getting > lost, > I can't imagine how new drm_gpuvm users would feel... > > > Also it seems that if we are to maintain two modes here, for > > reasonably clean code we'd need two separate instances of > > get_next_bo_from_list(). > > > > For the !RESV_PROTECTED case, perhaps one would want to consider > > the > > solution used currently in xe, where the VM maintains two evict > > lists. > > One protected by a spinlock and one protected by the VM resv. When > > the > > VM resv is locked to begin list traversal, the spinlock is locked > > *once* > > and the spinlock-protected list is looped over and copied into the > > resv > > protected one. For traversal, the resv protected one is used. > > Oh, so you do have the same sort of trick where you move the entire > list to another list, such that you can let other paths update the > list > while you're iterating your own snapshot. That's interesting... Yes, it's instead of the "evicted" bool suggested here. I thought the latter would be simpler. Although that remains to be seen after all use-cases are implemented. But in general I think the concept of copying from a staging list to another with different protection rather than traversing the first list and unlocking between items is a good way of solving the locking inversion problem with minimal overhead. We use it also for O(1) userptr validation. /Thomas > > Regards, > > Boris