This is an RFC, or possibly even a proof of concept, for UMD (User Mode Driver) direct submission in Xe. It is similar to AMD's design [1] [2] or ARM's design [3], utilizing a uAPI to convert user-space syncs (memory writes) to kernel-space syncs (DMA fences). It is built around the existing Xe preemption fences for dynamic memory management, such as userptr invalidation and buffer object (BO) eviction. The series also enables mapping a PPGTT-bound submission ring in non-privileged mode, as well as exposing indirect ring state (such as ring head, tail, etc.) and the doorbell to user space, enabling UMD direct submission. The target for this series is Mesa, with the goal of enabling UMD direct submission and removing the submission thread that currently handles future fences. I've discussed this with Sima and the Intel Mesa team, and it seems like a reachable target. Most synchronization will be handled in user space via memory writes and semaphore wait ring instructions, with only legacy cross-process synchronization (e.g., compositors) requiring kernel synchronization (DMA fences). The series includes some common patches at the beginning to implement preemption fences and user fences. The idea of preemption DMA-reservation slots [4] has been dropped in favor of attaching the last exported DMA fence to the preemption fence as suggested by AMD. This is a public checkpoint on the KMD (Kernel Mode Driver) work, which will be tabled until Intel's Mesa team has the bandwidth to begin the UMD work. That said, the uAPI is very preliminary and likely to change. One idea that was discussed is a common user fence interface based around DRM syncobjs, which will likely be explored further as UMD engagement begins. Some work for syncing VM binds (kernel operation) with UMD direct submission is also likely required. Testing has been done with [5], and the main features—such as basic submission, dynamic memory management, user-to-kernel sync conversion, and protection against endless user fences—are working on BMG and LNL. The GitLab branch [6] has also been pushed for reference. Any early community feedback is always appreciated. Matt [1] https://patchwork.freedesktop.org/series/113675/ [2] https://patchwork.freedesktop.org/series/114385/ [3] https://patchwork.freedesktop.org/series/137924/ [4] https://patchwork.freedesktop.org/series/141129/ [5] https://patchwork.freedesktop.org/series/141518/ [6] https://gitlab.freedesktop.org/mbrost/xe-kernel-driver-umd-submission-post/-/tree/post-11-18-24?ref_type=heads Matthew Brost (28): dma-fence: Add dma_fence_preempt base class dma-fence: Add dma_fence_user_fence drm/xe: Use dma_fence_preempt base class drm/xe: Allocate doorbells for UMD exec queues drm/xe: Add doorbell ID to snapshot capture drm/xe: Break submission ring out into its own BO drm/xe: Break indirect ring state out into its own BO drm/xe: Clear GGTT in xe_bo_restore_kernel FIXME: drm/xe: Add pad to ring and indirect state drm/xe: Enable indirect ring on media GT drm/xe: Don't add pinned mappings to VM bulk move drm/xe: Add exec queue post init extension processing drm/xe: Add support for mmapping doorbells to user space drm/xe: Add support for mmapping submission ring and indirect ring state to user space drm/xe/uapi: Define UMD exec queue mapping uAPI drm/xe: Add usermap exec queue extension drm/xe: Drop EXEC_QUEUE_FLAG_UMD_SUBMISSION flag drm/xe: Do not allow usermap exec queues in exec IOCTL drm/xe: Teach GuC backend to kill usermap queues drm/xe: Enable preempt fences on usermap queues drm/xe/uapi: Add uAPI to convert user semaphore to / from drm syncobj drm/xe: Add user fence IRQ handler drm/xe: Add xe_hw_fence_user_init drm/xe: Add a message lock to the Xe GPU scheduler drm/xe: Always wait on preempt fences in vma_check_userptr drm/xe: Teach xe_sync layer about drm_xe_semaphore drm/xe: Add VM convert fence IOCTL drm/xe: Add user fence TDR Tejas Upadhyay (1): drm/xe/mmap: Add mmap support for PCI memory barrier drivers/dma-buf/Makefile | 2 +- drivers/dma-buf/dma-fence-preempt.c | 134 ++++++ drivers/dma-buf/dma-fence-user-fence.c | 73 ++++ drivers/gpu/drm/xe/xe_bo.c | 29 +- drivers/gpu/drm/xe/xe_bo.h | 5 + drivers/gpu/drm/xe/xe_bo_evict.c | 8 +- drivers/gpu/drm/xe/xe_device.c | 181 +++++++- drivers/gpu/drm/xe/xe_device_types.h | 3 + drivers/gpu/drm/xe/xe_exec.c | 3 +- drivers/gpu/drm/xe/xe_exec_queue.c | 175 +++++++- drivers/gpu/drm/xe/xe_exec_queue.h | 5 + drivers/gpu/drm/xe/xe_exec_queue_types.h | 13 + drivers/gpu/drm/xe/xe_execlist.c | 2 +- drivers/gpu/drm/xe/xe_ggtt.c | 19 +- drivers/gpu/drm/xe/xe_ggtt.h | 2 + drivers/gpu/drm/xe/xe_gpu_scheduler.c | 19 +- drivers/gpu/drm/xe/xe_gpu_scheduler.h | 12 +- drivers/gpu/drm/xe/xe_gpu_scheduler_types.h | 2 + drivers/gpu/drm/xe/xe_guc_exec_queue_types.h | 9 +- drivers/gpu/drm/xe/xe_guc_submit.c | 177 +++++++- drivers/gpu/drm/xe/xe_guc_submit_types.h | 2 + drivers/gpu/drm/xe/xe_hw_engine.c | 4 +- drivers/gpu/drm/xe/xe_hw_engine_group.c | 4 +- drivers/gpu/drm/xe/xe_hw_fence.c | 17 + drivers/gpu/drm/xe/xe_hw_fence.h | 3 + drivers/gpu/drm/xe/xe_lrc.c | 176 ++++++-- drivers/gpu/drm/xe/xe_lrc.h | 4 +- drivers/gpu/drm/xe/xe_lrc_types.h | 16 +- drivers/gpu/drm/xe/xe_pci.c | 1 + drivers/gpu/drm/xe/xe_preempt_fence.c | 89 ++-- drivers/gpu/drm/xe/xe_preempt_fence.h | 2 +- drivers/gpu/drm/xe/xe_preempt_fence_types.h | 11 +- drivers/gpu/drm/xe/xe_pt.c | 5 +- drivers/gpu/drm/xe/xe_sync.c | 90 ++++ drivers/gpu/drm/xe/xe_sync.h | 8 + drivers/gpu/drm/xe/xe_sync_types.h | 5 +- drivers/gpu/drm/xe/xe_vm.c | 423 ++++++++++++++++++- drivers/gpu/drm/xe/xe_vm.h | 4 +- drivers/gpu/drm/xe/xe_vm_types.h | 26 ++ include/linux/dma-fence-preempt.h | 56 +++ include/linux/dma-fence-user-fence.h | 31 ++ include/uapi/drm/xe_drm.h | 147 ++++++- 42 files changed, 1798 insertions(+), 199 deletions(-) create mode 100644 drivers/dma-buf/dma-fence-preempt.c create mode 100644 drivers/dma-buf/dma-fence-user-fence.c create mode 100644 include/linux/dma-fence-preempt.h create mode 100644 include/linux/dma-fence-user-fence.h -- 2.34.1