On Sat, 2022-11-12 at 23:57 -0800, Niranjana Vishwanathapura wrote: > DRM_I915_GEM_VM_BIND/UNBIND ioctls allows UMD to bind/unbind GEM > buffer objects (BOs) or sections of a BOs at specified GPU virtual > addresses on a specified address space (VM). Multiple mappings can map > to the same physical pages of an object (aliasing). These mappings (also > referred to as persistent mappings) will be persistent across multiple > GPU submissions (execbuf calls) issued by the UMD, without user having > to provide a list of all required mappings during each submission (as > required by older execbuf mode). > > This patch series support VM_BIND version 1, as described by the param > I915_PARAM_VM_BIND_VERSION. > > Add new execbuf3 ioctl (I915_GEM_EXECBUFFER3) which only works in > vm_bind mode. The vm_bind mode only works with this new execbuf3 ioctl. > The new execbuf3 ioctl will not have any execlist support and all the > legacy support like relocations etc., are removed. > > NOTEs: > * It is based on below VM_BIND design+uapi rfc. > Documentation/gpu/rfc/i915_vm_bind.rst Just as a FYI to everyone, there is a Mesa Iris implementation that makes use of this series here: https://gitlab.freedesktop.org/pzanoni/mesa/-/commits/upstream-vmbind/ Some notes on it: - Tested on TGL and Alchemist (aka DG2). - The code still has a lot of TODOs and some FIXMEs. - It was somewhat tested with the common Linux benchmarks (Dota 2, Manhattan, Xonotic, etc.) and survived most of what I threw at it. The only problems I saw so far are: - Sometimes you get a random GPU hang with Dota 2, but it seems to go away if you use INTEL_DEBUG=nofc . I'm investigating it right now. - Very very rarely DRM_IOCTL_SYNCOBJ_HANDLE_TO_FD returns EINVAL and we don't properly recover. I am still trying to understand this one, it's once in a blue moon that it happens. - Performance seems to be mostly equivalent to non-vm-bind, but I haven't spent a lot of time really gathering numbers since I'm mostly debugging things. Using VM_PRIVATE BOs is the key here, you cut Dota's performance by almost half if you don't use private BOs. Considering how ABYSMALLY slower things get, I would assume there's probably something the Kernel could do here to handle things a little faster. The 'perf' tool shows a lot of i915 fence-related functions wasting CPU time when we don't use private BOs. - I am not really sure whether the implicit tracking code actually works or not. It doesn't really seem to make much of a difference if I remove it entirely, but I'm still planning to spend more time analyzing this. If anybody knows of a workload that will absolutely fail if we get this wrong, please tell me. - There's the whole frontbuffer tracking discussion from v6 that's still pending resolution. - The Vulkan implementation will come later. I wanted to sort all the GL details later so I don't make the same mistakes twice. - I really dislike how we have to handle the lack of implicit tracking support. The code is excessive and racy. See export_bo_sync_state(), import_bo_sync_state() and their caller from src/gallium/drivers/iris/iris_batch.c. Suggestions for Mesa improvements are welcome, but I would really really really prefer to have a way to just tell execbuffer3 to handle implicit tracking for these buffers for me in an atomic way. - I kinda wish execbuffer3 still accepted a pointer to struct drm_i915_gem_exec_fence in addition to struct drm_i915_gem_timeline_fence, since we already have to keep a list of exec_fences for execbuf2, and then in the execbuf3 we convert them to the new format. We could also do the opposite and leave execbuf2 with the slower path. But I could live without this, no problem. - Credits to Ken, Jason, Lionel, Niranjana, Nanley, Daniel and everybody else who helped me sort things out here. Is this ready to be merged to the Kernel? Maybe, but I'd like us to sort these things out first: 1. Get conclusion regarding the frontbuffer tracking issue first. 2. Get some validation from more experienced people (*winks at Jason*) that our approach with implicit tracking is correct here. Or convince Niranjana to add a way to pass buffers for implicit tracking so the Kernel can atomically inside execbuf3 what we're trying to do with 8 ioctls. 3. Fix all the Mesa bugs so we're 100% sure they're not design flaws of the Kernel. But that's just my humble opinion. Thanks, Paulo > > * The IGT RFC series is posted as, > [PATCH i-g-t v7 0/13] vm_bind: Add VM_BIND validation support > > v2: Address various review comments > v3: Address review comments and other fixes > v4: Remove vm_unbind out fence uapi which is not supported yet, > replace vm->vm_bind_mode check with i915_gem_vm_is_vm_bind_mode() > v5: Render kernel-doc, use PIN_NOEVICT, limit vm_bind support to > non-recoverable faults > v6: Rebased, minor fixes, add reserved fields to drm_i915_gem_vm_bind, > add new patch for async vm_unbind support > v7: Rebased, minor cleanups as per review feedback > > Test-with: 20221113075309.32023-1-niranjana.vishwanathapura@xxxxxxxxx > > Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@xxxxxxxxx> > > Niranjana Vishwanathapura (20): > drm/i915/vm_bind: Expose vm lookup function > drm/i915/vm_bind: Add __i915_sw_fence_await_reservation() > drm/i915/vm_bind: Expose i915_gem_object_max_page_size() > drm/i915/vm_bind: Add support to create persistent vma > drm/i915/vm_bind: Implement bind and unbind of object > drm/i915/vm_bind: Support for VM private BOs > drm/i915/vm_bind: Add support to handle object evictions > drm/i915/vm_bind: Support persistent vma activeness tracking > drm/i915/vm_bind: Add out fence support > drm/i915/vm_bind: Abstract out common execbuf functions > drm/i915/vm_bind: Use common execbuf functions in execbuf path > drm/i915/vm_bind: Implement I915_GEM_EXECBUFFER3 ioctl > drm/i915/vm_bind: Update i915_vma_verify_bind_complete() > drm/i915/vm_bind: Expose i915_request_await_bind() > drm/i915/vm_bind: Handle persistent vmas in execbuf3 > drm/i915/vm_bind: userptr dma-resv changes > drm/i915/vm_bind: Limit vm_bind mode to non-recoverable contexts > drm/i915/vm_bind: Add uapi for user to enable vm_bind_mode > drm/i915/vm_bind: Render VM_BIND documentation > drm/i915/vm_bind: Async vm_unbind support > > Documentation/gpu/i915.rst | 78 +- > drivers/gpu/drm/i915/Makefile | 3 + > drivers/gpu/drm/i915/gem/i915_gem_context.c | 43 +- > drivers/gpu/drm/i915/gem/i915_gem_context.h | 17 + > drivers/gpu/drm/i915/gem/i915_gem_create.c | 72 +- > drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c | 6 + > .../gpu/drm/i915/gem/i915_gem_execbuffer.c | 522 +---------- > .../gpu/drm/i915/gem/i915_gem_execbuffer3.c | 872 ++++++++++++++++++ > .../drm/i915/gem/i915_gem_execbuffer_common.c | 671 ++++++++++++++ > .../drm/i915/gem/i915_gem_execbuffer_common.h | 76 ++ > drivers/gpu/drm/i915/gem/i915_gem_ioctls.h | 2 + > drivers/gpu/drm/i915/gem/i915_gem_object.c | 3 + > drivers/gpu/drm/i915/gem/i915_gem_object.h | 2 + > .../gpu/drm/i915/gem/i915_gem_object_types.h | 6 + > drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 19 + > drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h | 30 + > .../drm/i915/gem/i915_gem_vm_bind_object.c | 449 +++++++++ > drivers/gpu/drm/i915/gt/intel_gtt.c | 17 + > drivers/gpu/drm/i915/gt/intel_gtt.h | 21 + > drivers/gpu/drm/i915/i915_driver.c | 4 + > drivers/gpu/drm/i915/i915_drv.h | 2 + > drivers/gpu/drm/i915/i915_gem_gtt.c | 39 + > drivers/gpu/drm/i915/i915_gem_gtt.h | 3 + > drivers/gpu/drm/i915/i915_getparam.c | 3 + > drivers/gpu/drm/i915/i915_sw_fence.c | 28 +- > drivers/gpu/drm/i915/i915_sw_fence.h | 23 +- > drivers/gpu/drm/i915/i915_vma.c | 186 +++- > drivers/gpu/drm/i915/i915_vma.h | 68 +- > drivers/gpu/drm/i915/i915_vma_types.h | 39 + > include/uapi/drm/i915_drm.h | 274 +++++- > 30 files changed, 3027 insertions(+), 551 deletions(-) > create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c > create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_execbuffer_common.c > create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_execbuffer_common.h > create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h > create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c >