On Tue, Nov 12, 2024 at 06:27:31PM -0800, Matthew Brost wrote: > On Tue, Nov 12, 2024 at 12:09:52PM +0100, Christian König wrote: > > Am 12.11.24 um 04:29 schrieb Matthew Brost: > > > On Mon, Nov 11, 2024 at 02:42:02PM +0100, Christian König wrote: > > > > Am 09.11.24 um 18:29 schrieb Matthew Brost: > > > > > The motivation for this series comes from pending UMD submission work by > > > > > AMD [1], ARM [3], and the Xe team, who are also beginning to look at > > > > > this. Sima has suggested [4] some common driver preemptive fences and > > > > > semantics, which we all agree on. This is the first attempt to implement > > > > > them, based on Xe's existing long-running preemptive fences. > > > > > > > > > > The semantics are fairly simple: preemptive fences attached to a > > > > > dma-resv must wait to enable signaling (and thus preempt device > > > > > execution) until all fences in other slots have been signaled. These > > > > > semantics are necessary to avoid deadlocks when preempting a device > > > > > while a user submission is pending, and resuming device execution > > > > > depends on the user submission completing. The semantics align with > > > > > Christian's view [5], which I also arrived at independently (with a > > > > > little help from Sima). > > > > Ah! I was really wondering for a moment why you wanted to add a separate > > > > dma_resv usage for that. But I strongly think that this approach won't work. > > > > > > > > The assumption that completion fences which depend on a preemption fence are > > > > always part of the same dma_resv object is most likely false in some cases. > > > > > > > > At least for the AMD design what can easily happen is that we put a > > > > completion fence only into a drm_syncobj for explicit synchronization and > > > > then engines or different devices which still use kernel submissions depend > > > > on it. That can go boom really trivially. > > > > > > > > What we do instead is to wait for the latest issued completion fence while > > > > holding a mutex to prevent creating new ones before stopping the threads and > > > wrt to a mutex... > > > > > > A current code reference would be nice. I looked some of the code on > > > list and dma-fencing rules are broken... > > > > > > e.g., This patch [1], takes 'uq_mgr->userq_mutex' in the dma fencing path. > > > This patch [2], takes 'uq_mgr->userq_mutex' and then dma-resv lock / > > > allocates memory. That is clearly not allowed. > > > > No that is unproblematic. The dma_resv is only locked after the preemption > > fence is already signaled. > > > > That's kinda cheating, fragile, and a pretty poor practice IMO. Your > driver though. > > > The problem is currently that we have separated the signaling worker from > > the resume worker. E.g. the mutex is dropped in between. > > > > Again, can you share any code refs? See below, will share work shortly. > > > > > > > Perhaps this is fixed in your current code base though. > > > > > > [1] https://patchwork.freedesktop.org/patch/593210/?series=133345&rev=1 > > > [2] https://patchwork.freedesktop.org/patch/593211/?series=133345&rev=1 > > > > > > > signaling the preemption fence. > > > > > > > Right, that works and is functionally equivalent to the intended purpose > > > here. > > > > > > I left out a key detail: when a user fence is converted into a dma-fence > > > and exported in a syncobj or syncfile, it must also be installed in all > > > of the VM's dma-resv bookkeeping slots (i.e., in VM's dma-resv and all > > > extobj dma-resv mapped in the VM). > > > > I don't think that this is something we should do. > > > > > Before exporting a dma-fence, the VM must be validated + resumed, and > > > you must hold all dma-resv locks, so the above is trivial. > > > > Hui? Why should we hold all the dma-resv locks for that? > > > > We only hold the userq_mutex to make sure that no user fence is created > > while we are in the signaling path of the eviction fence. > > > > I reason that the user fence -> DMA fence IOCTL (and vice versa) is > essentially another version of the rebind worker, which also performs > the fence transformation and installs fences in the preempt slots to > guard against unwanted preemption while a DMA fence has been exported. > It seems to work quite nicely. > Ugh, typos. s/installs fences in the preempt slots/installs dma-fences in the bookkeep slots/ Matt > > > If you're using gpuvm, just call drm_gpuvm_resv_add_fence. I assume AMD has a > > > similarly simple call. > > > > Nope, we try to avoid locking all BOs in the VM as hard as we can. > > > > Why? Calling in to perform fence conversion shouldn't be all that > frequent and simplifies things. > > Also, it's likely that only a few locks are involved, as not too many > external BOs are mapped within a VM (i.e., most BOs share the VM's > dma-resv lock). > > > > Now the ordering works inherently in dma-resv and the scheduler. e.g. No > > > need to track the last completion fences explictly in preempt fences. > > > > I really don't think that this is a good approach. Explicitly keeping the > > last completion fence in the pre-emption fence is basically a must have as > > far as I can see. > > > > The approach you take here looks like a really ugly hack to me. > > > > Well, I have to disagree; it seems like a pretty solid, common design. > > Anyway, I think I have this more or less working. I want to run this by > the Mesa team a bit to ensure I haven't missed anything, and will likely > post something shortly after. > > We can discuss this more after I post and perhaps solicit other > opinions, weighing the pros and cons of the approaches here. I do think > they function roughly the same, so something commonly agreed upon would > be good. Sharing a bit of code, if possible, is always a plus too. > > Matt > > > Regards, > > Christian. > > > > > > > > I'm pretty sure if converted your code this solution it would work, > > > there are however a couple of bugs in this post which I have fixed. > > > > > > This is a common solution based on existing components which I'd lean > > > towards rather than some new component. > > > > > > Matt > > > > > > > That code could of course be made some driver independent component. > > > > > > > > Regards, > > > > Christian. > > > > > > > > > Implemented via the new dma-resv slot DMA_RESV_USAGE_PREEMPT, a common > > > > > embedded base preemptive fence class with driver operations, and updates > > > > > to the scheduler to adhere to these semantics. > > > > > > > > > > The current Xe long-running preemptive fences have been converted to the > > > > > new common code, and UMD submission is expected to follow (hopefully) > > > > > shortly thereafter in a subsequent series. > > > > > > > > > > Not intended to be presented as the final solution, but rather to > > > > > kickstart serious discussions on this topic. > > > > > > > > > > Matt > > > > > > > > > > [1] https://patchwork.freedesktop.org/series/113675/ > > > > > [2] https://patchwork.freedesktop.org/series/114385/ > > > > > [3] https://patchwork.freedesktop.org/series/137924/ > > > > > [4] https://patchwork.kernel.org/project/dri-devel/cover/20240828172605.19176-1-mihail.atanassov@xxxxxxx/#26011577 > > > > > [5] https://patchwork.kernel.org/project/dri-devel/cover/20240828172605.19176-1-mihail.atanassov@xxxxxxx/#26011853 > > > > > > > > > > Matthew Brost (6): > > > > > dma-resv: Add DMA_RESV_USAGE_PREEMPT > > > > > drm/sched: Teach scheduler about DMA_RESV_USAGE_PREEMPT > > > > > dma-fence: Add dma_fence_preempt base class > > > > > drm/sched: Teach scheduler about dma_fence_prempt type > > > > > drm/xe: Use DMA_RESV_USAGE_PREEMPT for preempt fences > > > > > drm/xe: Use dma_fence_preempt base class > > > > > > > > > > drivers/dma-buf/Makefile | 2 +- > > > > > drivers/dma-buf/dma-fence-preempt.c | 102 ++++++++++++++++++++ > > > > > drivers/dma-buf/dma-resv.c | 43 ++++++--- > > > > > drivers/dma-buf/st-dma-resv.c | 2 +- > > > > > drivers/gpu/drm/scheduler/sched_entity.c | 29 +++++- > > > > > drivers/gpu/drm/scheduler/sched_main.c | 50 +++++++--- > > > > > drivers/gpu/drm/xe/xe_bo.c | 22 +---- > > > > > drivers/gpu/drm/xe/xe_guc_submit.c | 3 + > > > > > drivers/gpu/drm/xe/xe_hw_engine_group.c | 4 +- > > > > > drivers/gpu/drm/xe/xe_migrate.c | 4 +- > > > > > drivers/gpu/drm/xe/xe_preempt_fence.c | 81 +++++----------- > > > > > drivers/gpu/drm/xe/xe_preempt_fence.h | 2 +- > > > > > drivers/gpu/drm/xe/xe_preempt_fence_types.h | 11 +-- > > > > > drivers/gpu/drm/xe/xe_pt.c | 12 +-- > > > > > drivers/gpu/drm/xe/xe_vm.c | 22 +---- > > > > > include/drm/gpu_scheduler.h | 15 +++ > > > > > include/linux/dma-fence-preempt.h | 54 +++++++++++ > > > > > include/linux/dma-fence.h | 1 + > > > > > include/linux/dma-resv.h | 24 +++-- > > > > > 19 files changed, 326 insertions(+), 157 deletions(-) > > > > > create mode 100644 drivers/dma-buf/dma-fence-preempt.c > > > > > create mode 100644 include/linux/dma-fence-preempt.h > > > > > > >