Re: [PATCH drm-misc-next v5 4/6] drm/gpuvm: track/lock/validate external/evicted objects

Thomas Hellström <thomas.hellstrom@xxxxxxxxxxxxxxx> · Tue, 3 Oct 2023 20:57:29 +0200

On 10/3/23 18:55, Danilo Krummrich wrote:
It seems like we're mostly aligned on this series, except for the key
controversy we're discussing for a few versions now: locking of the internal
lists. Hence, let's just re-iterate the options we have to get this out of the
way.

(1) The spinlock dance. This basically works for every use case, updating the VA
     space from the IOCTL, from the fence signaling path or anywhere else.
     However, it has the downside of requiring spin_lock() / spin_unlock() for
     *each* list element when locking all external objects and validating all
     evicted objects. Typically, the amount of extobjs and evicted objects
     shouldn't be excessive, but there might be exceptions, e.g. Xe.

(2) The dma-resv lock dance. This is convinient for drivers updating the VA
     space from a VM_BIND ioctl() and is especially efficient if such drivers
     have a huge amount of external and/or evicted objects to manage. However,
     the downsides are that it requires a few tricks in drivers updating the VA
     space from the fence signaling path (e.g. job_run()). Design wise, I'm still
     skeptical that it is a good idea to protect internal data structures with
     external locks in a way that it's not clear to callers that a certain
     function would access one of those resources and hence needs protection.
     E.g. it is counter intuitive that drm_gpuvm_bo_put() would require both the
     dma-resv lock of the corresponding object and the VM's dma-resv lock held.
     (Additionally, there were some concerns from amdgpu regarding flexibility in
     terms of using GPUVM for non-VM_BIND uAPIs and compute, however, AFAICS
     those discussions did not complete and to me it's still unclear why it
     wouldn't work.)

(3) Simply use an internal mutex per list. This adds a tiny (IMHO negligible)
     overhead for drivers updating the VA space from a VM_BIND ioctl(), namely
     a *single* mutex_lock()/mutex_unlock() when locking all external objects
     and validating all evicted objects. And it still requires some tricks for
     drivers updating the VA space from the fence signaling path. However, it's
     as simple as it can be and hence way less error prone as well as
     self-contained and hence easy to use. Additionally, it's flexible in a way
     that we don't have any expections on drivers to already hold certain locks
     that the driver in some situation might not be able to acquire in the first
     place.

(4) Arbitrary combinations of the above. For instance, the current V5 implements
     both (1) and (2) (as either one or the other). But also (1) and (3) (as in
     (1) additionally to (3)) would be an option, where a driver could opt-in for
     the spinlock dance in case it updates the VA space from the fence signaling
     path.

I also considered a few other options as well, however, they don't seem to be
flexible enough. For instance, as by now we could use SRCU for the external
object list. However, this falls apart once a driver wants to remove and re-add
extobjs for the same VM_BO instance. (For the same reason it wouldn't work for
evicted objects.)

Personally, after seeing the weird implications of (1), (2) and a combination of
both, I tend to go with (3). Optionally, with an opt-in for (1). The reason for
the latter is that with (3) the weirdness of (1) by its own mostly disappears.

Please let me know what you think, and, of course, other ideas than the
mentioned ones above are still welcome.

- Danilo

Here are the locking principles Daniel put together and Dave once called 
out for us to be applying when reviewing DRM code. These were prompted 
by very fragile and hard to understand locking patterns in the i915 
driver and I think the xe vm_bind locking design was made with these in 
mind, (not sure exactly who wrote what, though so can't say for sure).

https://blog.ffwll.ch/2022/07/locking-engineering.html
https://blog.ffwll.ch/2022/08/locking-hierarchy.html

At least to me, this motivates using the resv design unless we strictly 
need lower level locks that are taken in the eviction paths or userptr 
invalidation paths, but doesn't rule out spinlocks or lock dropping 
tricks where these are really necessary. But pretty much rules out RCU / 
SRCU from what I can tell.

It also calls for documenting how individual members of structs are 
protected when ever possible.

Thanks,
Thomas