On Tue, Apr 13, 2021 at 12:47:06PM +0100, Matthew Auld wrote: > Add an entry for the new uAPI needed for DG1. > > v2(Daniel): > - include the overall upstreaming plan > - add a note for mmap, there are differences here for TTM vs i915 > - bunch of other suggestions from Daniel > > Signed-off-by: Matthew Auld <matthew.auld@xxxxxxxxx> > Cc: Joonas Lahtinen <joonas.lahtinen@xxxxxxxxxxxxxxx> > Cc: Daniel Vetter <daniel.vetter@xxxxxxxxx> > Cc: Dave Airlie <airlied@xxxxxxxxx> Bunch more thoughts below, I think we're getting there. Thanks for doing this. > --- > Documentation/gpu/rfc/i915_gem_lmem.h | 151 ++++++++++++++++++++++++ > Documentation/gpu/rfc/i915_gem_lmem.rst | 119 +++++++++++++++++++ > Documentation/gpu/rfc/index.rst | 4 + > 3 files changed, 274 insertions(+) > create mode 100644 Documentation/gpu/rfc/i915_gem_lmem.h > create mode 100644 Documentation/gpu/rfc/i915_gem_lmem.rst > > diff --git a/Documentation/gpu/rfc/i915_gem_lmem.h b/Documentation/gpu/rfc/i915_gem_lmem.h > new file mode 100644 > index 000000000000..6ae13209d7ef > --- /dev/null > +++ b/Documentation/gpu/rfc/i915_gem_lmem.h > @@ -0,0 +1,151 @@ > +/* The new query_id for struct drm_i915_query_item */ > +#define DRM_I915_QUERY_MEMORY_REGIONS 0xdeadbeaf > + > +/** > + * enum drm_i915_gem_memory_class > + */ > +enum drm_i915_gem_memory_class { Are we really going with enum in uapi? I thought that was frought with peril since the integer type of enum is quite a bit up to compilers. But maybe I'm just scared. > + /** @I915_MEMORY_CLASS_SYSTEM: system memory */ > + I915_MEMORY_CLASS_SYSTEM = 0, > + /** @I915_MEMORY_CLASS_DEVICE: device local-memory */ > + I915_MEMORY_CLASS_DEVICE, > +}; > + > +/** > + * struct drm_i915_gem_memory_class_instance > + */ > +struct drm_i915_gem_memory_class_instance { > + /** @memory_class: see enum drm_i915_gem_memory_class */ > + __u16 memory_class; > + > + /** @memory_instance: which instance */ > + __u16 memory_instance; > +}; > + > +/** > + * struct drm_i915_memory_region_info > + * > + * Describes one region as known to the driver. > + */ > +struct drm_i915_memory_region_info { > + /** @region: class:instance pair encoding */ > + struct drm_i915_gem_memory_class_instance region; > + > + /** @rsvd0: MBZ */ > + __u32 rsvd0; > + > + /** @caps: MBZ */ > + __u64 caps; > + > + /** @flags: MBZ */ > + __u64 flags; > + > + /** @probed_size: Memory probed by the driver (-1 = unknown) */ > + __u64 probed_size; > + > + /** @unallocated_size: Estimate of memory remaining (-1 = unknown) */ > + __u64 unallocated_size; > + > + /** @rsvd1: MBZ */ > + __u64 rsvd1[8]; I guess this is for future stuff that becomes relevant with multi-tile? Might be worth explaining in 1-2 words why we reserve a pile here. Also it doesn't matter ofc for performance here :-) > +}; > + > +/** > + * struct drm_i915_query_memory_regions > + * > + * Region info query enumerates all regions known to the driver by filling in > + * an array of struct drm_i915_memory_region_info structures. I guess this works with the usual 1. query number of regions 2. get them all two-step ioctl flow? Worth explaining here. > + */ > +struct drm_i915_query_memory_regions { > + /** @num_regions: Number of supported regions */ > + __u32 num_regions; > + > + /** @rsvd: MBZ */ > + __u32 rsvd[3]; > + > + /** @regions: Info about each supported region */ > + struct drm_i915_memory_region_info regions[]; > +}; Hm don't we need a query ioctl for this too? > + > +#define DRM_I915_GEM_CREATE_EXT 0xdeadbeaf > +#define DRM_IOCTL_I915_GEM_CREATE_EXT DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_CREATE_EXT, struct drm_i915_gem_create_ext) > + > +/** > + * struct drm_i915_gem_create_ext I think some explanation here that all new bo flags will be added here, and that in general we're phasing out the various SET/GET ioctls. > + */ > +struct drm_i915_gem_create_ext { > + /** > + * @size: Requested size for the object. > + * > + * The (page-aligned) allocated size for the object will be returned. > + */ > + __u64 size; > + /** > + * @handle: Returned handle for the object. > + * > + * Object handles are nonzero. > + */ > + __u32 handle; > + /** @flags: MBZ */ > + __u32 flags; > + /** > + * @extensions: > + * For I915_GEM_CREATE_EXT_SETPARAM extension usage see both: > + * struct drm_i915_gem_create_ext_setparam. > + * struct drm_i915_gem_object_param for the possible parameters. > + */ > +#define I915_GEM_CREATE_EXT_SETPARAM 0 > + __u64 extensions; > +}; > + > +/** > + * struct drm_i915_gem_object_param > + */ > +struct drm_i915_gem_object_param { > + /** @handle: Object handle (0 for I915_GEM_CREATE_EXT_SETPARAM) */ Uh no, this looks like leftovers having a separate SETPARAM ioctl. That's pretty bad design, and we (well Jason) is doing serious surgery to undo that mistakes. Please remove. > + __u32 handle; > + > + /** @size: Data pointer size */ > + __u32 size; > + > +/* > + * I915_OBJECT_PARAM: > + * > + * Select object namespace for the param. > + */ > +#define I915_OBJECT_PARAM (1ull<<32) > + > +/** > + * @param: select the desired param. > + * > + * I915_OBJECT_PARAM_MEMORY_REGIONS: > + * > + * Set the data pointer with the desired set of placements in priority > + * order(each entry must be unique and supported by the device), as an array of > + * drm_i915_gem_memory_class_instance, or an equivalent layout of class:instance > + * pair encodings. See DRM_I915_QUERY_MEMORY_REGIONS for how to query the > + * supported regions. > + * > + * In this case the data pointer size should be the number of > + * drm_i915_gem_memory_class_instance elements in the placements array. > + */ > +#define I915_PARAM_MEMORY_REGIONS 0 > +#define I915_OBJECT_PARAM_MEMORY_REGIONS (I915_OBJECT_PARAM | \ > + I915_PARAM_MEMORY_REGIONS) > + __u64 param; > + > + /** @data: Data value or pointer */ > + __u64 data; > +}; > + > +/** > + * struct drm_i915_gem_create_ext_setparam > + */ > +struct drm_i915_gem_create_ext_setparam { > + /** @base: extension link */ > + struct i915_user_extension base; > + /** @param: param to apply for this extension */ > + struct drm_i915_gem_object_param param; > +}; > + > + > diff --git a/Documentation/gpu/rfc/i915_gem_lmem.rst b/Documentation/gpu/rfc/i915_gem_lmem.rst > new file mode 100644 > index 000000000000..41bc06240ccc > --- /dev/null > +++ b/Documentation/gpu/rfc/i915_gem_lmem.rst > @@ -0,0 +1,119 @@ > +========================= > +I915 DG1/LMEM RFC Section > +========================= > + > +Upstream plan > +============= > +For upstream the overall plan for landing all the DG1 stuff and turning it for > +real, with all the uAPI bits is: > + > +* Merge basic HW enabling of DG1(still without pciid) > +* Merge the uAPI bits behind special CONFIG_BROKEN(or so) flag > + * At this point we can still make changes, but importantly this lets us > + start running IGTs which can utilize local-memory in CI > +* Convert over to TTM, make sure it all keeps working > +* Add pciid for DG1 > +* Turn on uAPI for real > + > +New object placement and region query uAPI > +========================================== > +Starting from DG1 we need to give userspace the ability to allocate buffers from > +device local-memory. Currently the driver supports gem_create, which can place > +buffers in system memory via shmem, and the usual assortment of other > +interfaces, like dumb buffers and userptr. > + > +To support this new capability, while also providing a uAPI which will work > +beyond just DG1, we propose to offer three new bits of uAPI: > + > +DRM_I915_QUERY_MEMORY_REGIONS > +----------------------------- > +Query mechanism which allows userspace to discover the list of supported memory > +regions(like system-memory and local-memory) for a given device. We identify > +each region with a class and instance pair, which should be unique. The class > +here would be DEVICE or SYSTEM, and the instance would be zero, on platforms > +like DG1. > + > +Side note: The class/instance design is borrowed from our existing engine uAPI, > +where we describe every physical engine in terms of its class, and the > +particular instance, since we can have more than one per class. > + > +In the future we also want to expose more information which can further > +describe the capabilities of a region. > + > +.. kernel-doc:: Documentation/gpu/rfc/i915_gem_lmem.h > + :functions: drm_i915_gem_memory_class drm_i915_gem_memory_class_instance drm_i915_memory_region_info drm_i915_query_memory_regions > + > +GEM_CREATE_EXT > +-------------- > +New ioctl which is basically just gem_create but now allows userspace to > +provide a chain of possible extensions. Note that if we don't provide any > +extensions then we get the exact same behaviour as gem_create. > + > +Side note: We also need to support PXP[1] in the near future, which is also > +applicable to integrated platforms, and adds its own gem_create_ext extension, > +which basically lets userspace mark a buffer as "protected". > + > +.. kernel-doc:: Documentation/gpu/rfc/i915_gem_lmem.h > + :functions: drm_i915_gem_create_ext > + > +I915_OBJECT_PARAM_MEMORY_REGIONS > +-------------------------------- > +Implemented as an extension for gem_create_ext, we would now allow userspace to > +optionally provide an immutable list of preferred placements at creation time, > +in priority order, for a given buffer object. For the placements we expect > +them each to use the class/instance encoding, as per the output of the regions > +query. Having the list in priority order will be useful in the future when > +placing an object, say during eviction. > + > +.. kernel-doc:: Documentation/gpu/rfc/i915_gem_lmem.h > + :functions: drm_i915_gem_object_param drm_i915_gem_create_ext_setparam > + > +Example placement usage > +----------------------- > +As an example, on DG1 if we wish to set the placement as local-memory we can do > +something like: > + > +.. code-block:: C > + > + struct drm_i915_gem_memory_class_instance region_param = { > + .memory_class = I915_MEMORY_CLASS_DEVICE, > + .memory_instance = 0, > + }; > + struct drm_i915_gem_create_ext_setparam setparam_region = { > + .base = { .name = I915_GEM_CREATE_EXT_SETPARAM }, > + .param = { > + .param = I915_OBJECT_PARAM_MEMORY_REGIONS, > + .data = (uintptr_t)®ion_param, > + .size = 1, > + }, > + }; > + > + struct drm_i915_gem_create_ext create_ext = { > + .size = 16 * PAGE_SIZE, > + .extensions = (uintptr_t)&setparam_region, > + }; > + int err = ioctl(fd, DRM_IOCTL_I915_GEM_CREATE_EXT, &create_ext); > + if (err) ... I would put this example into the main ioctl struct kerneldoc comment. Once we move it into uapi, that's the place people will most likely see it. People = umd folks here. Another thing that would be nice to do here is kerneldoc-ify the existing uapi this is building on top of, like i915_user_extensions. That way we could improve the kerneldoc with more hyperlinks from the rfc here to the main one. Also an example in i915_user_extensions that explains how they're supposed to be linked would be nice. > + > +One fair criticism here is that this seems a little over-engineered[2]. If we > +just consider DG1 then yes, a simple gem_create.flags or something is totally > +all that's needed to tell the kernel to allocate the buffer in local-memory or > +whatever. However looking to the future we need uAPI which can also support > +upcoming Xe HP multi-tile architecture in a sane way, where there can be > +multiple local-memory instances for a given device, and so using both class and > +instance in our uAPI to describe regions is desirable, although specifically > +for DG1 it's uninteresting, since we only have a single local-memory instance. > + > +I915 MMAP > +========= > +In i915 there are multiple ways to MMAP GEM object, including mapping the same > +object using different mapping types(WC vs WB), i.e multiple active mmaps per > +object. TTM expects one MMAP at most for the lifetime of the object. If it > +turns out that we have to backpedal here, there might be some potential > +userspace fallout. Ok there's another issue here, which is SET/GET_CACHING. TTM doesn't allow you to change this, but DG1 doesn't support non-snooped pcie transactions, so we can just always allocate as WB for smem-only buffers. If/when our hw gains support for non-snooped pcie transactions then we must fix this mode at allocation time as a new gem extension. I think this needs another section called out here about SET/GET_CACHING. Now the mmap problem is tightly related, because in general (meaning, when we're not running on intel cpus) the cpu mmap must not, ever, be inconsistent with allocation mode. So what I think we should here is that the kernel picks the mmap mode for userspace from the following table: smem-only: WB. Userspace does not need to call clflush. smem+lmem: We allocate uncached memory, and give userspace a WC mapping for when the buffer is in smem, and WC when it's in lmem. GPU does snooped access, which is a bit inefficient but oh well whatever. lmem only: always WC This means on discrete you only get a single mmap mode, all others must be rejected. That's probably going to be a new default mode or something like that. Cheers, Daniel > + > +Links > +===== > +[1] https://patchwork.freedesktop.org/series/86798/ > + > +[2] https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5599#note_553791 > diff --git a/Documentation/gpu/rfc/index.rst b/Documentation/gpu/rfc/index.rst > index a8621f7dab8b..05670442ca1b 100644 > --- a/Documentation/gpu/rfc/index.rst > +++ b/Documentation/gpu/rfc/index.rst > @@ -15,3 +15,7 @@ host such documentation: > > * Once the code has landed move all the documentation to the right places in > the main core, helper or driver sections. > + > +.. toctree:: > + > + i915_gem_lmem.rst > -- > 2.26.3 > > _______________________________________________ > dri-devel mailing list > dri-devel@xxxxxxxxxxxxxxxxxxxxx > https://lists.freedesktop.org/mailman/listinfo/dri-devel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel