On Tue, Nov 12, 2024 at 12:09:52PM +0100, Christian König wrote: > Am 12.11.24 um 04:29 schrieb Matthew Brost: > > On Mon, Nov 11, 2024 at 02:42:02PM +0100, Christian König wrote: > > > Am 09.11.24 um 18:29 schrieb Matthew Brost: > > > > The motivation for this series comes from pending UMD submission work by > > > > AMD [1], ARM [3], and the Xe team, who are also beginning to look at > > > > this. Sima has suggested [4] some common driver preemptive fences and > > > > semantics, which we all agree on. This is the first attempt to implement > > > > them, based on Xe's existing long-running preemptive fences. > > > > > > > > The semantics are fairly simple: preemptive fences attached to a > > > > dma-resv must wait to enable signaling (and thus preempt device > > > > execution) until all fences in other slots have been signaled. These > > > > semantics are necessary to avoid deadlocks when preempting a device > > > > while a user submission is pending, and resuming device execution > > > > depends on the user submission completing. The semantics align with > > > > Christian's view [5], which I also arrived at independently (with a > > > > little help from Sima). > > > Ah! I was really wondering for a moment why you wanted to add a separate > > > dma_resv usage for that. But I strongly think that this approach won't work. > > > > > > The assumption that completion fences which depend on a preemption fence are > > > always part of the same dma_resv object is most likely false in some cases. > > > > > > At least for the AMD design what can easily happen is that we put a > > > completion fence only into a drm_syncobj for explicit synchronization and > > > then engines or different devices which still use kernel submissions depend > > > on it. That can go boom really trivially. > > > > > > What we do instead is to wait for the latest issued completion fence while > > > holding a mutex to prevent creating new ones before stopping the threads and > > wrt to a mutex... > > > > A current code reference would be nice. I looked some of the code on > > list and dma-fencing rules are broken... > > > > e.g., This patch [1], takes 'uq_mgr->userq_mutex' in the dma fencing path. > > This patch [2], takes 'uq_mgr->userq_mutex' and then dma-resv lock / > > allocates memory. That is clearly not allowed. > > No that is unproblematic. The dma_resv is only locked after the preemption > fence is already signaled. > That's kinda cheating, fragile, and a pretty poor practice IMO. Your driver though. > The problem is currently that we have separated the signaling worker from > the resume worker. E.g. the mutex is dropped in between. > Again, can you share any code refs? See below, will share work shortly. > > > > Perhaps this is fixed in your current code base though. > > > > [1] https://patchwork.freedesktop.org/patch/593210/?series=133345&rev=1 > > [2] https://patchwork.freedesktop.org/patch/593211/?series=133345&rev=1 > > > > > signaling the preemption fence. > > > > > Right, that works and is functionally equivalent to the intended purpose > > here. > > > > I left out a key detail: when a user fence is converted into a dma-fence > > and exported in a syncobj or syncfile, it must also be installed in all > > of the VM's dma-resv bookkeeping slots (i.e., in VM's dma-resv and all > > extobj dma-resv mapped in the VM). > > I don't think that this is something we should do. > > > Before exporting a dma-fence, the VM must be validated + resumed, and > > you must hold all dma-resv locks, so the above is trivial. > > Hui? Why should we hold all the dma-resv locks for that? > > We only hold the userq_mutex to make sure that no user fence is created > while we are in the signaling path of the eviction fence. > I reason that the user fence -> DMA fence IOCTL (and vice versa) is essentially another version of the rebind worker, which also performs the fence transformation and installs fences in the preempt slots to guard against unwanted preemption while a DMA fence has been exported. It seems to work quite nicely. > > If you're using gpuvm, just call drm_gpuvm_resv_add_fence. I assume AMD has a > > similarly simple call. > > Nope, we try to avoid locking all BOs in the VM as hard as we can. > Why? Calling in to perform fence conversion shouldn't be all that frequent and simplifies things. Also, it's likely that only a few locks are involved, as not too many external BOs are mapped within a VM (i.e., most BOs share the VM's dma-resv lock). > > Now the ordering works inherently in dma-resv and the scheduler. e.g. No > > need to track the last completion fences explictly in preempt fences. > > I really don't think that this is a good approach. Explicitly keeping the > last completion fence in the pre-emption fence is basically a must have as > far as I can see. > > The approach you take here looks like a really ugly hack to me. > Well, I have to disagree; it seems like a pretty solid, common design. Anyway, I think I have this more or less working. I want to run this by the Mesa team a bit to ensure I haven't missed anything, and will likely post something shortly after. We can discuss this more after I post and perhaps solicit other opinions, weighing the pros and cons of the approaches here. I do think they function roughly the same, so something commonly agreed upon would be good. Sharing a bit of code, if possible, is always a plus too. Matt > Regards, > Christian. > > > > > I'm pretty sure if converted your code this solution it would work, > > there are however a couple of bugs in this post which I have fixed. > > > > This is a common solution based on existing components which I'd lean > > towards rather than some new component. > > > > Matt > > > > > That code could of course be made some driver independent component. > > > > > > Regards, > > > Christian. > > > > > > > Implemented via the new dma-resv slot DMA_RESV_USAGE_PREEMPT, a common > > > > embedded base preemptive fence class with driver operations, and updates > > > > to the scheduler to adhere to these semantics. > > > > > > > > The current Xe long-running preemptive fences have been converted to the > > > > new common code, and UMD submission is expected to follow (hopefully) > > > > shortly thereafter in a subsequent series. > > > > > > > > Not intended to be presented as the final solution, but rather to > > > > kickstart serious discussions on this topic. > > > > > > > > Matt > > > > > > > > [1] https://patchwork.freedesktop.org/series/113675/ > > > > [2] https://patchwork.freedesktop.org/series/114385/ > > > > [3] https://patchwork.freedesktop.org/series/137924/ > > > > [4] https://patchwork.kernel.org/project/dri-devel/cover/20240828172605.19176-1-mihail.atanassov@xxxxxxx/#26011577 > > > > [5] https://patchwork.kernel.org/project/dri-devel/cover/20240828172605.19176-1-mihail.atanassov@xxxxxxx/#26011853 > > > > > > > > Matthew Brost (6): > > > > dma-resv: Add DMA_RESV_USAGE_PREEMPT > > > > drm/sched: Teach scheduler about DMA_RESV_USAGE_PREEMPT > > > > dma-fence: Add dma_fence_preempt base class > > > > drm/sched: Teach scheduler about dma_fence_prempt type > > > > drm/xe: Use DMA_RESV_USAGE_PREEMPT for preempt fences > > > > drm/xe: Use dma_fence_preempt base class > > > > > > > > drivers/dma-buf/Makefile | 2 +- > > > > drivers/dma-buf/dma-fence-preempt.c | 102 ++++++++++++++++++++ > > > > drivers/dma-buf/dma-resv.c | 43 ++++++--- > > > > drivers/dma-buf/st-dma-resv.c | 2 +- > > > > drivers/gpu/drm/scheduler/sched_entity.c | 29 +++++- > > > > drivers/gpu/drm/scheduler/sched_main.c | 50 +++++++--- > > > > drivers/gpu/drm/xe/xe_bo.c | 22 +---- > > > > drivers/gpu/drm/xe/xe_guc_submit.c | 3 + > > > > drivers/gpu/drm/xe/xe_hw_engine_group.c | 4 +- > > > > drivers/gpu/drm/xe/xe_migrate.c | 4 +- > > > > drivers/gpu/drm/xe/xe_preempt_fence.c | 81 +++++----------- > > > > drivers/gpu/drm/xe/xe_preempt_fence.h | 2 +- > > > > drivers/gpu/drm/xe/xe_preempt_fence_types.h | 11 +-- > > > > drivers/gpu/drm/xe/xe_pt.c | 12 +-- > > > > drivers/gpu/drm/xe/xe_vm.c | 22 +---- > > > > include/drm/gpu_scheduler.h | 15 +++ > > > > include/linux/dma-fence-preempt.h | 54 +++++++++++ > > > > include/linux/dma-fence.h | 1 + > > > > include/linux/dma-resv.h | 24 +++-- > > > > 19 files changed, 326 insertions(+), 157 deletions(-) > > > > create mode 100644 drivers/dma-buf/dma-fence-preempt.c > > > > create mode 100644 include/linux/dma-fence-preempt.h > > > > >