On Fri, Nov 18, 2022 at 03:53:34PM -0800, Zanoni, Paulo R wrote:
On Sat, 2022-11-12 at 23:57 -0800, Niranjana Vishwanathapura wrote:
DRM_I915_GEM_VM_BIND/UNBIND ioctls allows UMD to bind/unbind GEM
buffer objects (BOs) or sections of a BOs at specified GPU virtual
addresses on a specified address space (VM). Multiple mappings can map
to the same physical pages of an object (aliasing). These mappings (also
referred to as persistent mappings) will be persistent across multiple
GPU submissions (execbuf calls) issued by the UMD, without user having
to provide a list of all required mappings during each submission (as
required by older execbuf mode).
This patch series support VM_BIND version 1, as described by the param
I915_PARAM_VM_BIND_VERSION.
Add new execbuf3 ioctl (I915_GEM_EXECBUFFER3) which only works in
vm_bind mode. The vm_bind mode only works with this new execbuf3 ioctl.
The new execbuf3 ioctl will not have any execlist support and all the
legacy support like relocations etc., are removed.
NOTEs:
* It is based on below VM_BIND design+uapi rfc.
Documentation/gpu/rfc/i915_vm_bind.rst
Just as a FYI to everyone, there is a Mesa Iris implementation that
makes use of this series here:
https://gitlab.freedesktop.org/pzanoni/mesa/-/commits/upstream-vmbind/
Some notes on it:
- Tested on TGL and Alchemist (aka DG2).
- The code still has a lot of TODOs and some FIXMEs.
- It was somewhat tested with the common Linux benchmarks (Dota 2,
Manhattan, Xonotic, etc.) and survived most of what I threw at it. The
only problems I saw so far are:
- Sometimes you get a random GPU hang with Dota 2, but it seems to go
away if you use INTEL_DEBUG=nofc . I'm investigating it right now.
- Very very rarely DRM_IOCTL_SYNCOBJ_HANDLE_TO_FD returns EINVAL and
we don't properly recover. I am still trying to understand this one,
it's once in a blue moon that it happens.
- Performance seems to be mostly equivalent to non-vm-bind, but I
haven't spent a lot of time really gathering numbers since I'm mostly
debugging things. Using VM_PRIVATE BOs is the key here, you cut Dota's
performance by almost half if you don't use private BOs. Considering
how ABYSMALLY slower things get, I would assume there's probably
something the Kernel could do here to handle things a little faster.
The 'perf' tool shows a lot of i915 fence-related functions wasting CPU
time when we don't use private BOs.
Thanks Paulo for trying this out.
i915 loops through each non-private BO of working set in the submission path
updating it's dma-resv fence list. So, user have to use privater BOs where
possible and when BOs needs to be shared, VM_UNBIND it when it is no longer
required (thus removing it from working set for subsequent submissions).
- I am not really sure whether the implicit tracking code actually
works or not. It doesn't really seem to make much of a difference if I
remove it entirely, but I'm still planning to spend more time analyzing
this. If anybody knows of a workload that will absolutely fail if we
get this wrong, please tell me.
- There's the whole frontbuffer tracking discussion from v6 that's
still pending resolution.
- The Vulkan implementation will come later. I wanted to sort all the
GL details later so I don't make the same mistakes twice.
- I really dislike how we have to handle the lack of implicit tracking
support. The code is excessive and racy. See export_bo_sync_state(),
import_bo_sync_state() and their caller from
src/gallium/drivers/iris/iris_batch.c. Suggestions for Mesa
improvements are welcome, but I would really really really prefer to
have a way to just tell execbuffer3 to handle implicit tracking for
these buffers for me in an atomic way.
- I kinda wish execbuffer3 still accepted a pointer to struct
drm_i915_gem_exec_fence in addition to struct
drm_i915_gem_timeline_fence, since we already have to keep a list of
exec_fences for execbuf2, and then in the execbuf3 we convert them to
the new format. We could also do the opposite and leave execbuf2 with
the slower path. But I could live without this, no problem.
You mean having an array of <handle, value> pairs (as with execbuf3)
vs having separate handles array an value array (as with execbuf2)?
I think former is much cleaner interface.
- Credits to Ken, Jason, Lionel, Niranjana, Nanley, Daniel and
everybody else who helped me sort things out here.
Is this ready to be merged to the Kernel? Maybe, but I'd like us to
sort these things out first:
1. Get conclusion regarding the frontbuffer tracking issue first.
2. Get some validation from more experienced people (*winks at Jason*)
that our approach with implicit tracking is correct here. Or convince
Niranjana to add a way to pass buffers for implicit tracking so the
Kernel can atomically inside execbuf3 what we're trying to do with 8
ioctls.
It was discussed during VM_BIND rfc design and implicit dependency
tracking is removed. Note that supporting it in execbuf3 is not trivial
with our VM_BIND dma-resv usage model.
Regards,
Niranjana
3. Fix all the Mesa bugs so we're 100% sure they're not design flaws of
the Kernel.
But that's just my humble opinion.
Thanks,
Paulo
* The IGT RFC series is posted as,
[PATCH i-g-t v7 0/13] vm_bind: Add VM_BIND validation support
v2: Address various review comments
v3: Address review comments and other fixes
v4: Remove vm_unbind out fence uapi which is not supported yet,
replace vm->vm_bind_mode check with i915_gem_vm_is_vm_bind_mode()
v5: Render kernel-doc, use PIN_NOEVICT, limit vm_bind support to
non-recoverable faults
v6: Rebased, minor fixes, add reserved fields to drm_i915_gem_vm_bind,
add new patch for async vm_unbind support
v7: Rebased, minor cleanups as per review feedback
Test-with: 20221113075309.32023-1-niranjana.vishwanathapura@xxxxxxxxx
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@xxxxxxxxx>
Niranjana Vishwanathapura (20):
drm/i915/vm_bind: Expose vm lookup function
drm/i915/vm_bind: Add __i915_sw_fence_await_reservation()
drm/i915/vm_bind: Expose i915_gem_object_max_page_size()
drm/i915/vm_bind: Add support to create persistent vma
drm/i915/vm_bind: Implement bind and unbind of object
drm/i915/vm_bind: Support for VM private BOs
drm/i915/vm_bind: Add support to handle object evictions
drm/i915/vm_bind: Support persistent vma activeness tracking
drm/i915/vm_bind: Add out fence support
drm/i915/vm_bind: Abstract out common execbuf functions
drm/i915/vm_bind: Use common execbuf functions in execbuf path
drm/i915/vm_bind: Implement I915_GEM_EXECBUFFER3 ioctl
drm/i915/vm_bind: Update i915_vma_verify_bind_complete()
drm/i915/vm_bind: Expose i915_request_await_bind()
drm/i915/vm_bind: Handle persistent vmas in execbuf3
drm/i915/vm_bind: userptr dma-resv changes
drm/i915/vm_bind: Limit vm_bind mode to non-recoverable contexts
drm/i915/vm_bind: Add uapi for user to enable vm_bind_mode
drm/i915/vm_bind: Render VM_BIND documentation
drm/i915/vm_bind: Async vm_unbind support
Documentation/gpu/i915.rst | 78 +-
drivers/gpu/drm/i915/Makefile | 3 +
drivers/gpu/drm/i915/gem/i915_gem_context.c | 43 +-
drivers/gpu/drm/i915/gem/i915_gem_context.h | 17 +
drivers/gpu/drm/i915/gem/i915_gem_create.c | 72 +-
drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c | 6 +
.../gpu/drm/i915/gem/i915_gem_execbuffer.c | 522 +----------
.../gpu/drm/i915/gem/i915_gem_execbuffer3.c | 872 ++++++++++++++++++
.../drm/i915/gem/i915_gem_execbuffer_common.c | 671 ++++++++++++++
.../drm/i915/gem/i915_gem_execbuffer_common.h | 76 ++
drivers/gpu/drm/i915/gem/i915_gem_ioctls.h | 2 +
drivers/gpu/drm/i915/gem/i915_gem_object.c | 3 +
drivers/gpu/drm/i915/gem/i915_gem_object.h | 2 +
.../gpu/drm/i915/gem/i915_gem_object_types.h | 6 +
drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 19 +
drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h | 30 +
.../drm/i915/gem/i915_gem_vm_bind_object.c | 449 +++++++++
drivers/gpu/drm/i915/gt/intel_gtt.c | 17 +
drivers/gpu/drm/i915/gt/intel_gtt.h | 21 +
drivers/gpu/drm/i915/i915_driver.c | 4 +
drivers/gpu/drm/i915/i915_drv.h | 2 +
drivers/gpu/drm/i915/i915_gem_gtt.c | 39 +
drivers/gpu/drm/i915/i915_gem_gtt.h | 3 +
drivers/gpu/drm/i915/i915_getparam.c | 3 +
drivers/gpu/drm/i915/i915_sw_fence.c | 28 +-
drivers/gpu/drm/i915/i915_sw_fence.h | 23 +-
drivers/gpu/drm/i915/i915_vma.c | 186 +++-
drivers/gpu/drm/i915/i915_vma.h | 68 +-
drivers/gpu/drm/i915/i915_vma_types.h | 39 +
include/uapi/drm/i915_drm.h | 274 +++++-
30 files changed, 3027 insertions(+), 551 deletions(-)
create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_execbuffer_common.c
create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_execbuffer_common.h
create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c