On Thu, Jun 30, 2022 at 10:14 AM Matthew Auld <matthew.auld@xxxxxxxxx
<mailto:matthew.auld@xxxxxxxxx>> wrote:
On 30/06/2022 06:11, Jason Ekstrand wrote:
> On Sat, Jun 25, 2022 at 8:49 PM Niranjana Vishwanathapura
> <niranjana.vishwanathapura@xxxxxxxxx
<mailto:niranjana.vishwanathapura@xxxxxxxxx>
> <mailto:niranjana.vishwanathapura@xxxxxxxxx
<mailto:niranjana.vishwanathapura@xxxxxxxxx>>> wrote:
>
> VM_BIND and related uapi definitions
>
> v2: Reduce the scope to simple Mesa use case.
> v3: Expand VM_UNBIND documentation and add
> I915_GEM_VM_BIND/UNBIND_FENCE_VALID
> and I915_GEM_VM_BIND_TLB_FLUSH flags.
> v4: Remove I915_GEM_VM_BIND_TLB_FLUSH flag and add additional
> documentation for vm_bind/unbind.
> v5: Remove TLB flush requirement on VM_UNBIND.
> Add version support to stage implementation.
> v6: Define and use drm_i915_gem_timeline_fence structure for
> all timeline fences.
> v7: Rename I915_PARAM_HAS_VM_BIND to
I915_PARAM_VM_BIND_VERSION.
> Update documentation on async vm_bind/unbind and
versioning.
> Remove redundant vm_bind/unbind FENCE_VALID flag,
execbuf3
> batch_count field and I915_EXEC3_SECURE flag.
>
> Signed-off-by: Niranjana Vishwanathapura
> <niranjana.vishwanathapura@xxxxxxxxx
<mailto:niranjana.vishwanathapura@xxxxxxxxx>
> <mailto:niranjana.vishwanathapura@xxxxxxxxx
<mailto:niranjana.vishwanathapura@xxxxxxxxx>>>
> Reviewed-by: Daniel Vetter <daniel.vetter@xxxxxxxx
<mailto:daniel.vetter@xxxxxxxx>
> <mailto:daniel.vetter@xxxxxxxx
<mailto:daniel.vetter@xxxxxxxx>>>
> ---
> Documentation/gpu/rfc/i915_vm_bind.h | 280
+++++++++++++++++++++++++++
> 1 file changed, 280 insertions(+)
> create mode 100644 Documentation/gpu/rfc/i915_vm_bind.h
>
> diff --git a/Documentation/gpu/rfc/i915_vm_bind.h
> b/Documentation/gpu/rfc/i915_vm_bind.h
> new file mode 100644
> index 000000000000..a93e08bceee6
> --- /dev/null
> +++ b/Documentation/gpu/rfc/i915_vm_bind.h
> @@ -0,0 +1,280 @@
> +/* SPDX-License-Identifier: MIT */
> +/*
> + * Copyright © 2022 Intel Corporation
> + */
> +
> +/**
> + * DOC: I915_PARAM_VM_BIND_VERSION
> + *
> + * VM_BIND feature version supported.
> + * See typedef drm_i915_getparam_t param.
> + *
> + * Specifies the VM_BIND feature version supported.
> + * The following versions of VM_BIND have been defined:
> + *
> + * 0: No VM_BIND support.
> + *
> + * 1: In VM_UNBIND calls, the UMD must specify the exact
mappings
> created
> + * previously with VM_BIND, the ioctl will not support
unbinding
> multiple
> + * mappings or splitting them. Similarly, VM_BIND calls
will not
> replace
> + * any existing mappings.
> + *
> + * 2: The restrictions on unbinding partial or multiple
mappings is
> + * lifted, Similarly, binding will replace any mappings
in the
> given range.
> + *
> + * See struct drm_i915_gem_vm_bind and struct
drm_i915_gem_vm_unbind.
> + */
> +#define I915_PARAM_VM_BIND_VERSION 57
> +
> +/**
> + * DOC: I915_VM_CREATE_FLAGS_USE_VM_BIND
> + *
> + * Flag to opt-in for VM_BIND mode of binding during VM
creation.
> + * See struct drm_i915_gem_vm_control flags.
> + *
> + * The older execbuf2 ioctl will not support VM_BIND mode of
operation.
> + * For VM_BIND mode, we have new execbuf3 ioctl which will
not
> accept any
> + * execlist (See struct drm_i915_gem_execbuffer3 for more
details).
> + */
> +#define I915_VM_CREATE_FLAGS_USE_VM_BIND (1 << 0)
> +
> +/* VM_BIND related ioctls */
> +#define DRM_I915_GEM_VM_BIND 0x3d
> +#define DRM_I915_GEM_VM_UNBIND 0x3e
> +#define DRM_I915_GEM_EXECBUFFER3 0x3f
> +
> +#define DRM_IOCTL_I915_GEM_VM_BIND
> DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_BIND, struct
> drm_i915_gem_vm_bind)
> +#define DRM_IOCTL_I915_GEM_VM_UNBIND
> DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_UNBIND, struct
> drm_i915_gem_vm_bind)
> +#define DRM_IOCTL_I915_GEM_EXECBUFFER3
> DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_EXECBUFFER3, struct
> drm_i915_gem_execbuffer3)
> +
> +/**
> + * struct drm_i915_gem_timeline_fence - An input or output
timeline
> fence.
> + *
> + * The operation will wait for input fence to signal.
> + *
> + * The returned output fence will be signaled after the
completion
> of the
> + * operation.
> + */
> +struct drm_i915_gem_timeline_fence {
> + /** @handle: User's handle for a drm_syncobj to wait
on or
> signal. */
> + __u32 handle;
> +
> + /**
> + * @flags: Supported flags are:
> + *
> + * I915_TIMELINE_FENCE_WAIT:
> + * Wait for the input fence before the operation.
> + *
> + * I915_TIMELINE_FENCE_SIGNAL:
> + * Return operation completion fence as output.
> + */
> + __u32 flags;
> +#define I915_TIMELINE_FENCE_WAIT (1 << 0)
> +#define I915_TIMELINE_FENCE_SIGNAL (1 << 1)
> +#define __I915_TIMELINE_FENCE_UNKNOWN_FLAGS
> (-(I915_TIMELINE_FENCE_SIGNAL << 1))
> +
> + /**
> + * @value: A point in the timeline.
> + * Value must be 0 for a binary drm_syncobj. A Value
of 0 for a
> + * timeline drm_syncobj is invalid as it turns a
drm_syncobj
> into a
> + * binary one.
> + */
> + __u64 value;
> +};
> +
> +/**
> + * struct drm_i915_gem_vm_bind - VA to object mapping to
bind.
> + *
> + * This structure is passed to VM_BIND ioctl and specifies
the
> mapping of GPU
> + * virtual address (VA) range to the section of an object
that
> should be bound
> + * in the device page table of the specified address space
(VM).
> + * The VA range specified must be unique (ie., not currently
bound)
> and can
> + * be mapped to whole object or a section of the object
(partial
> binding).
> + * Multiple VA mappings can be created to the same section
of the
> object
> + * (aliasing).
> + *
> + * The @start, @offset and @length must be 4K page aligned.
However
> the DG2
> + * and XEHPSDV has 64K page size for device local-memory
and has
> compact page
> + * table. On those platforms, for binding device local-memory
> objects, the
> + * @start must be 2M aligned, @offset and @length must be
64K aligned.
>
>
> This is not acceptable. We need 64K granularity. This
includes the
> starting address, the BO offset, and the length. Why? The
tl;dr is
> that it's a requirement for about 50% of D3D12 apps if we want
them to
> run on Linux via D3D12. A longer explanation follows. I don't
> necessarily expect kernel folks to get all the details but
hopefully
> I'll have left enough of a map that some of the Intel Mesa
folks can
> help fill in details.
>
> Many modern D3D12 apps have a hard requirement on Tier2 tiled
> resources. This is a feature that Intel has supported in the
D3D12
> driver since Skylake. In order to implement this feature, VKD3D
> requires the various sparseResidencyImage* and
sparseResidency*Sampled
> Vulkan features. If we want those apps to work (there's getting
to be
> quite a few of them), we need to implement the Vulkan sparse
residency
> features.
> |
> |
> What is sparse residency? I'm glad you asked! The sparse
residency
> features allow a client to separately bind each miplevel or array
slice
> of an image to a chunk of device memory independently, without
affecting
> any other areas of the image. Once you get to a high enough
miplevel
> that everything fits inside a single sparse image block (that's a
> technical Vulkan term you can search for in the spec), you can
enter a
> "miptail" which contains all the remaining miplevels in a single
sparse
> image block.
>
> The term "sparse image block" is what the Vulkan spec uses. On
Intel
> hardware and in the docs, it's what we call a "tile".
Specifically, the
> image needs to use Yf or Ys tiling on SKL-TGL or a Tile64 on
DG2+. This
> is because Tile4 and legacy X and Y-tiling don't provide any
guarantees
> about page alignment for slices. Yf, Ys, and Tile64, on the
other hand,
> align all slices of the image to a tile boundary, allowing us
to map
> memory to different slices independently, assuming we have 64K
(or 4K
> for Yf) VM_BIND granularity. (4K isn't actually a requirement for
> SKL-TGL; we can use Ys all the time which has 64K tiles but
there's no
> reason to not support 4K alignments on integrated.)
>
> Someone may be tempted to ask, "Can't we wiggle the strides
around or
> something to make it work?" I thought about that and no, you
can't.
> The problem here is LOD2+. Sure, you can have a stride such that
the
> image is a multiple of 2M worth of tiles across. That'll work
fine for
> LOD0 and LOD1; both will be 2M aligned. However, LOD2 won't be
and
> there's no way to control that. The hardware will place it to
the right
> of LOD1 by ROUND_UP(width, tile_width) pixels and there's nothing
you
> can do about that. If that position doesn't happen to hit a 2M
> boundary, you're out of luck.
>
> I hope that explanation provides enough detail. Sadly, this is
one of
> those things which has a lot of moving pieces all over different
bits of
> the hardware and various APIs and they all have to work together
just
> right for it to all come out in the end. But, yeah, we really
need 64K
> aligned binding if we want VKD3D to work.
Just to confirm, the new model would be to enforce 64K GTT alignment
for
lmem pages, and then for smem pages we would only require 4K
alignment,
but with the added restriction that userspace will never try to
mix the
two (lmem vs smem) within the same 2M va range (page-table). The
kernel
will verify this and throw an error if needed. This model should work
with the above?
Mesa doesn't have full control over BO placement so I don't think we
can guarantee quite as much as you want there. We can guarantee, I
think, that we never place LMEM-only and SMEM-only in the same 2M
block. However, most BOs will be LMEM+SMEM (with a preference for
LMEM) and then it'll be up to the kernel to sort out any issues. Is
that reasonable?