Acked-by: Tony Ye <tony.ye@xxxxxxxxx> Regards, Tony On 6/11/2021 4:40 PM, Matthew Brost wrote: > Add entry for i915 new parallel submission uAPI plan. > > v2: > (Daniel Vetter): > - Expand logical order explaination > - Add dummy header > - Only allow N BBs in execbuf IOCTL > - Configure parallel submission per slot not per gem context > v3: > (Marcin Ślusarz): > - Lot's of typos / bad english fixed > (Tvrtko Ursulin): > - Consistent pseudo code, clean up wording in descriptions > v4: > (Daniel Vetter) > - Drop flags > - Add kernel doc > - Reword a few things / fix typos > (Tvrtko) > - Reword a few things / fix typos > > Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> > Cc: Tony Ye <tony.ye@xxxxxxxxx> > CC: Carl Zhang <carl.zhang@xxxxxxxxx> > Cc: Daniel Vetter <daniel.vetter@xxxxxxxxx> > Cc: Jason Ekstrand <jason@xxxxxxxxxxxxxx> > Signed-off-by: Matthew Brost <matthew.brost@xxxxxxxxx> > Acked-by: Daniel Vetter <daniel.vetter@xxxxxxxx> > --- > Documentation/gpu/rfc/i915_parallel_execbuf.h | 117 ++++++++++++++++++ > Documentation/gpu/rfc/i915_scheduler.rst | 59 ++++++++- > 2 files changed, 175 insertions(+), 1 deletion(-) > create mode 100644 Documentation/gpu/rfc/i915_parallel_execbuf.h > > diff --git a/Documentation/gpu/rfc/i915_parallel_execbuf.h b/Documentation/gpu/rfc/i915_parallel_execbuf.h > new file mode 100644 > index 000000000000..c22af3a359e4 > --- /dev/null > +++ b/Documentation/gpu/rfc/i915_parallel_execbuf.h > @@ -0,0 +1,117 @@ > +#define I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT 2 /* see i915_context_engines_parallel_submit */ > + > +/** > + * struct drm_i915_context_engines_parallel_submit - Configure engine for > + * parallel submission. > + * > + * Setup a slot in the context engine map to allow multiple BBs to be submitted > + * in a single execbuf IOCTL. Those BBs will then be scheduled to run on the GPU > + * in parallel. Multiple hardware contexts are created internally in the i915 > + * run these BBs. Once a slot is configured for N BBs only N BBs can be > + * submitted in each execbuf IOCTL and this is implicit behavior e.g. The user > + * doesn't tell the execbuf IOCTL there are N BBs, the execbuf IOCTL knows how > + * many BBs there are based on the slot's configuration. The N BBs are the last > + * N buffer objects or first N if I915_EXEC_BATCH_FIRST is set. > + * > + * The default placement behavior is to create implicit bonds between each > + * context if each context maps to more than 1 physical engine (e.g. context is > + * a virtual engine). Also we only allow contexts of same engine class and these > + * contexts must be in logically contiguous order. Examples of the placement > + * behavior described below. Lastly, the default is to not allow BBs to > + * preempted mid BB rather insert coordinated preemption on all hardware > + * contexts between each set of BBs. Flags may be added in the future to change > + * bott of these default behaviors. > + * > + * Returns -EINVAL if hardware context placement configuration is invalid or if > + * the placement configuration isn't supported on the platform / submission > + * interface. > + * Returns -ENODEV if extension isn't supported on the platform / submission > + * inteface. > + * > + * .. code-block:: > + * > + * Example 1 pseudo code: > + * CS[X] = generic engine of same class, logical instance X > + * INVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONE > + * set_engines(INVALID) > + * set_parallel(engine_index=0, width=2, num_siblings=1, > + * engines=CS[0],CS[1]) > + * > + * Results in the following valid placement: > + * CS[0], CS[1] > + * > + * Example 2 pseudo code: > + * CS[X] = generic engine of same class, logical instance X > + * INVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONE > + * set_engines(INVALID) > + * set_parallel(engine_index=0, width=2, num_siblings=2, > + * engines=CS[0],CS[2],CS[1],CS[3]) > + * > + * Results in the following valid placements: > + * CS[0], CS[1] > + * CS[2], CS[3] > + * > + * This can also be thought of as 2 virtual engines described by 2-D array > + * in the engines the field with bonds placed between each index of the > + * virtual engines. e.g. CS[0] is bonded to CS[1], CS[2] is bonded to > + * CS[3]. > + * VE[0] = CS[0], CS[2] > + * VE[1] = CS[1], CS[3] > + * > + * Example 3 pseudo code: > + * CS[X] = generic engine of same class, logical instance X > + * INVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONE > + * set_engines(INVALID) > + * set_parallel(engine_index=0, width=2, num_siblings=2, > + * engines=CS[0],CS[1],CS[1],CS[3]) > + * > + * Results in the following valid and invalid placements: > + * CS[0], CS[1] > + * CS[1], CS[3] - Not logical contiguous, return -EINVAL > + */ > +struct drm_i915_context_engines_parallel_submit { > + /** > + * @base: base user extension. > + */ > + struct i915_user_extension base; > + > + /** > + * @engine_index: slot for parallel engine > + */ > + __u16 engine_index; > + > + /** > + * @width: number of contexts per parallel engine > + */ > + __u16 width; > + > + /** > + * @num_siblings: number of siblings per context > + */ > + __u16 num_siblings; > + > + /** > + * @mbz16: reserved for future use; must be zero > + */ > + __u16 mbz16; > + > + /** > + * @flags: all undefined flags must be zero, currently not defined flags > + */ > + __u64 flags; > + > + /** > + * @mbz64: reserved for future use; must be zero > + */ > + __u64 mbz64[3]; > + > + /** > + * @engines: 2-d array of engine instances to configure parallel engine > + * > + * length = width (i) * num_siblings (j) > + * index = j + i * num_siblings > + */ > + struct i915_engine_class_instance engines[0]; > + > +} __attribute__ ((packed)); > + > diff --git a/Documentation/gpu/rfc/i915_scheduler.rst b/Documentation/gpu/rfc/i915_scheduler.rst > index 7acd386a6b49..63849b50e663 100644 > --- a/Documentation/gpu/rfc/i915_scheduler.rst > +++ b/Documentation/gpu/rfc/i915_scheduler.rst > @@ -88,4 +88,61 @@ Spec references: > > New parallel submission uAPI > ============================ > -Details to come in a following patch. > +The existing bonding uAPI is completely broken with GuC submission because > +whether a submission is a single context submit or parallel submit isn't known > +until execbuf time activated via the I915_SUBMIT_FENCE. To submit multiple > +contexts in parallel with the GuC the context must be explicitly registered with > +N contexts and all N contexts must be submitted in a single command to the GuC. > +The GuC interfaces do not support dynamically changing between N contexts as the > +bonding uAPI does. Hence the need for a new parallel submission interface. Also > +the legacy bonding uAPI is quite confusing and not intuitive at all. Furthermore > +I915_SUBMIT_FENCE is by design a future fence, so not really something we should > +continue to support. > + > +The new parallel submission uAPI consists of 3 parts: > + > +* Export engines logical mapping > +* A 'set_parallel' extension to configure contexts for parallel > + submission > +* Extend execbuf2 IOCTL to support submitting N BBs in a single IOCTL > + > +Export engines logical mapping > +------------------------------ > +Certain use cases require BBs to be placed on engine instances in logical order > +(e.g. split-frame on gen11+). The logical mapping of engine instances can change > +based on fusing. Rather than making UMDs be aware of fusing, simply expose the > +logical mapping with the existing query engine info IOCTL. Also the GuC > +submission interface currently only supports submitting multiple contexts to > +engines in logical order which is a new requirement compared to execlists. > +Lastly, all current platforms have at most 2 engine instances and the logical > +order is the same as uAPI order. This will change on platforms with more than 2 > +engine instances. > + > +A single bit will be added to drm_i915_engine_info.flags indicating that the > +logical instance has been returned and a new field, > +drm_i915_engine_info.logical_instance, returns the logical instance. > + > +A 'set_parallel' extension to configure contexts for parallel submission > +------------------------------------------------------------------------ > +The 'set_parallel' extension configures a slot for parallel submission of N BBs. > +It is a setup step that must be called before using any of the contexts. See > +I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE or I915_CONTEXT_ENGINES_EXT_BOND for > +similar existing examples. Once a slot is configured for parallel submission the > +execbuf2 IOCTL can be called submitting N BBs in a single IOCTL. Initially only > +supports GuC submission. Execlists supports can be added later if needed. > + > +Add I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT and > +drm_i915_context_engines_parallel_submit to the uAPI to implement this > +extension. > + > +.. kernel-doc:: Documentation/gpu/rfc/i915_parallel_execbuf.h > + :functions: drm_i915_context_engines_parallel_submit > + > +Extend execbuf2 IOCTL to support submitting N BBs in a single IOCTL > +------------------------------------------------------------------- > +Contexts that have been configured with the 'set_parallel' extension can only > +submit N BBs in a single execbuf2 IOCTL. The BBs are either the last N objects > +in in the drm_i915_gem_exec_object2 list or the first N if I915_EXEC_BATCH_FIRST > +is set. The number of BBs is implicit based on the slot submitted and how it has > +been configured by 'set_parallel' or other extensions. No uAPI changes are > +required to execbuf2 IOCTL. >