RE: [Intel-gfx] [RFC v3 1/3] drm/doc/rfc: VM_BIND feature design document

"Zeng, Oak" <oak.zeng@xxxxxxxxx> · Tue, 14 Jun 2022 21:47:50 +0000

Thanks,
Oak

> -----Original Message-----
> From: dri-devel <dri-devel-bounces@xxxxxxxxxxxxxxxxxxxxx> On Behalf Of
> Zeng, Oak
> Sent: June 14, 2022 5:13 PM
> To: Vishwanathapura, Niranjana <niranjana.vishwanathapura@xxxxxxxxx>;
> Landwerlin, Lionel G <lionel.g.landwerlin@xxxxxxxxx>
> Cc: Intel GFX <intel-gfx@xxxxxxxxxxxxxxxxxxxxx>; Wilson, Chris P
> <chris.p.wilson@xxxxxxxxx>; Hellstrom, Thomas
> <thomas.hellstrom@xxxxxxxxx>; Maling list - DRI developers <dri-
> devel@xxxxxxxxxxxxxxxxxxxxx>; Vetter, Daniel <daniel.vetter@xxxxxxxxx>;
> Christian König <christian.koenig@xxxxxxx>
> Subject: RE: [Intel-gfx] [RFC v3 1/3] drm/doc/rfc: VM_BIND feature design
> document
> 
> 
> 
> Thanks,
> Oak
> 
> > -----Original Message-----
> > From: Vishwanathapura, Niranjana <niranjana.vishwanathapura@xxxxxxxxx>
> > Sent: June 14, 2022 1:02 PM
> > To: Landwerlin, Lionel G <lionel.g.landwerlin@xxxxxxxxx>
> > Cc: Zeng, Oak <oak.zeng@xxxxxxxxx>; Intel GFX <intel-
> > gfx@xxxxxxxxxxxxxxxxxxxxx>; Maling list - DRI developers <dri-
> > devel@xxxxxxxxxxxxxxxxxxxxx>; Hellstrom, Thomas
> > <thomas.hellstrom@xxxxxxxxx>; Wilson, Chris P
> <chris.p.wilson@xxxxxxxxx>;
> > Vetter, Daniel <daniel.vetter@xxxxxxxxx>; Christian König
> > <christian.koenig@xxxxxxx>
> > Subject: Re: [Intel-gfx] [RFC v3 1/3] drm/doc/rfc: VM_BIND feature design
> > document
> >
> > On Tue, Jun 14, 2022 at 10:04:00AM +0300, Lionel Landwerlin wrote:
> > >On 13/06/2022 21:02, Niranjana Vishwanathapura wrote:
> > >>On Mon, Jun 13, 2022 at 06:33:07AM -0700, Zeng, Oak wrote:
> > >>>
> > >>>
> > >>>Regards,
> > >>>Oak
> > >>>
> > >>>>-----Original Message-----
> > >>>>From: Intel-gfx <intel-gfx-bounces@xxxxxxxxxxxxxxxxxxxxx> On
> > >>>>Behalf Of Niranjana
> > >>>>Vishwanathapura
> > >>>>Sent: June 10, 2022 1:43 PM
> > >>>>To: Landwerlin, Lionel G <lionel.g.landwerlin@xxxxxxxxx>
> > >>>>Cc: Intel GFX <intel-gfx@xxxxxxxxxxxxxxxxxxxxx>; Maling list -
> > >>>>DRI developers <dri-
> > >>>>devel@xxxxxxxxxxxxxxxxxxxxx>; Hellstrom, Thomas
> > >>>><thomas.hellstrom@xxxxxxxxx>;
> > >>>>Wilson, Chris P <chris.p.wilson@xxxxxxxxx>; Vetter, Daniel
> > >>>><daniel.vetter@xxxxxxxxx>; Christian König
> > <christian.koenig@xxxxxxx>
> > >>>>Subject: Re: [Intel-gfx] [RFC v3 1/3] drm/doc/rfc: VM_BIND
> > >>>>feature design
> > >>>>document
> > >>>>
> > >>>>On Fri, Jun 10, 2022 at 11:18:14AM +0300, Lionel Landwerlin wrote:
> > >>>>>On 10/06/2022 10:54, Niranjana Vishwanathapura wrote:
> > >>>>>>On Fri, Jun 10, 2022 at 09:53:24AM +0300, Lionel Landwerlin wrote:
> > >>>>>>>On 09/06/2022 22:31, Niranjana Vishwanathapura wrote:
> > >>>>>>>>On Thu, Jun 09, 2022 at 05:49:09PM +0300, Lionel Landwerlin
> wrote:
> > >>>>>>>>>  On 09/06/2022 00:55, Jason Ekstrand wrote:
> > >>>>>>>>>
> > >>>>>>>>>    On Wed, Jun 8, 2022 at 4:44 PM Niranjana Vishwanathapura
> > >>>>>>>>> <niranjana.vishwanathapura@xxxxxxxxx> wrote:
> > >>>>>>>>>
> > >>>>>>>>>      On Wed, Jun 08, 2022 at 08:33:25AM +0100, Tvrtko
> > >>>>Ursulin wrote:
> > >>>>>>>>>      >
> > >>>>>>>>>      >
> > >>>>>>>>>      >On 07/06/2022 22:32, Niranjana Vishwanathapura wrote:
> > >>>>>>>>>      >>On Tue, Jun 07, 2022 at 11:18:11AM -0700, Niranjana
> > >>>>>>>>>Vishwanathapura
> > >>>>>>>>>      wrote:
> > >>>>>>>>>      >>>On Tue, Jun 07, 2022 at 12:12:03PM -0500, Jason
> > >>>>>>>>>Ekstrand wrote:
> > >>>>>>>>>      >>>> On Fri, Jun 3, 2022 at 6:52 PM Niranjana
> > >>>>Vishwanathapura
> > >>>>>>>>>      >>>> <niranjana.vishwanathapura@xxxxxxxxx> wrote:
> > >>>>>>>>>      >>>>
> > >>>>>>>>>      >>>>   On Fri, Jun 03, 2022 at 10:20:25AM +0300, Lionel
> > >>>>>>>>>Landwerlin
> > >>>>>>>>>      wrote:
> > >>>>>>>>>      >>>>   >   On 02/06/2022 23:35, Jason Ekstrand wrote:
> > >>>>>>>>>      >>>>   >
> > >>>>>>>>>      >>>>   >     On Thu, Jun 2, 2022 at 3:11 PM Niranjana
> > >>>>>>>>>Vishwanathapura
> > >>>>>>>>>      >>>>   > <niranjana.vishwanathapura@xxxxxxxxx> wrote:
> > >>>>>>>>>      >>>>   >
> > >>>>>>>>>      >>>>   >       On Wed, Jun 01, 2022 at 01:28:36PM
> > >>>>-0700, Matthew
> > >>>>>>>>>      >>>>Brost wrote:
> > >>>>>>>>>      >>>>   >       >On Wed, Jun 01, 2022 at 05:25:49PM
> > >>>>+0300, Lionel
> > >>>>>>>>>      Landwerlin
> > >>>>>>>>>      >>>>   wrote:
> > >>>>>>>>>      >>>>   > >> On 17/05/2022 21:32, Niranjana
> Vishwanathapura
> > >>>>>>>>>      wrote:
> > >>>>>>>>>      >>>>   > >> > +VM_BIND/UNBIND ioctl will immediately start
> > >>>>>>>>>      >>>>   binding/unbinding
> > >>>>>>>>>      >>>>   >       the mapping in an
> > >>>>>>>>>      >>>>   > >> > +async worker. The binding and
> > >>>>>>>>>unbinding will
> > >>>>>>>>>      >>>>work like a
> > >>>>>>>>>      >>>>   special
> > >>>>>>>>>      >>>>   >       GPU engine.
> > >>>>>>>>>      >>>>   > >> > +The binding and unbinding operations are
> > >>>>>>>>>      serialized and
> > >>>>>>>>>      >>>>   will
> > >>>>>>>>>      >>>>   >       wait on specified
> > >>>>>>>>>      >>>>   > >> > +input fences before the operation
> > >>>>>>>>>and will signal
> > >>>>>>>>>      the
> > >>>>>>>>>      >>>>   output
> > >>>>>>>>>      >>>>   >       fences upon the
> > >>>>>>>>>      >>>>   > >> > +completion of the operation. Due to
> > >>>>>>>>>      serialization,
> > >>>>>>>>>      >>>>   completion of
> > >>>>>>>>>      >>>>   >       an operation
> > >>>>>>>>>      >>>>   > >> > +will also indicate that all
> > >>>>>>>>>previous operations
> > >>>>>>>>>      >>>>are also
> > >>>>>>>>>      >>>>   > complete.
> > >>>>>>>>>      >>>>   > >>
> > >>>>>>>>>      >>>>   > >> I guess we should avoid saying "will
> > >>>>>>>>>immediately
> > >>>>>>>>>      start
> > >>>>>>>>>      >>>>   > binding/unbinding" if
> > >>>>>>>>>      >>>>   > >> there are fences involved.
> > >>>>>>>>>      >>>>   > >>
> > >>>>>>>>>      >>>>   > >> And the fact that it's happening in an async
> > >>>>>>>>>      >>>>worker seem to
> > >>>>>>>>>      >>>>   imply
> > >>>>>>>>>      >>>>   >       it's not
> > >>>>>>>>>      >>>>   > >> immediate.
> > >>>>>>>>>      >>>>   > >>
> > >>>>>>>>>      >>>>   >
> > >>>>>>>>>      >>>>   >       Ok, will fix.
> > >>>>>>>>>      >>>>   >       This was added because in earlier design
> > >>>>>>>>>binding was
> > >>>>>>>>>      deferred
> > >>>>>>>>>      >>>>   until
> > >>>>>>>>>      >>>>   >       next execbuff.
> > >>>>>>>>>      >>>>   >       But now it is non-deferred (immediate in
> > >>>>>>>>>that sense).
> > >>>>>>>>>      >>>>But yah,
> > >>>>>>>>>      >>>>   this is
> > >>>>>>>>>      >>>>   > confusing
> > >>>>>>>>>      >>>>   >       and will fix it.
> > >>>>>>>>>      >>>>   >
> > >>>>>>>>>      >>>>   > >>
> > >>>>>>>>>      >>>>   > >> I have a question on the behavior of the bind
> > >>>>>>>>>      >>>>operation when
> > >>>>>>>>>      >>>>   no
> > >>>>>>>>>      >>>>   >       input fence
> > >>>>>>>>>      >>>>   > >> is provided. Let say I do :
> > >>>>>>>>>      >>>>   > >>
> > >>>>>>>>>      >>>>   > >> VM_BIND (out_fence=fence1)
> > >>>>>>>>>      >>>>   > >>
> > >>>>>>>>>      >>>>   > >> VM_BIND (out_fence=fence2)
> > >>>>>>>>>      >>>>   > >>
> > >>>>>>>>>      >>>>   > >> VM_BIND (out_fence=fence3)
> > >>>>>>>>>      >>>>   > >>
> > >>>>>>>>>      >>>>   > >>
> > >>>>>>>>>      >>>>   > >> In what order are the fences going to
> > >>>>>>>>>be signaled?
> > >>>>>>>>>      >>>>   > >>
> > >>>>>>>>>      >>>>   > >> In the order of VM_BIND ioctls? Or out
> > >>>>>>>>>of order?
> > >>>>>>>>>      >>>>   > >>
> > >>>>>>>>>      >>>>   > >> Because you wrote "serialized I assume
> > >>>>>>>>>it's : in
> > >>>>>>>>>      order
> > >>>>>>>>>      >>>>   > >>
> > >>>>>>>>>      >>>>   >
> > >>>>>>>>>      >>>>   >       Yes, in the order of VM_BIND/UNBIND
> > >>>>>>>>>ioctls. Note that
> > >>>>>>>>>      >>>>bind and
> > >>>>>>>>>      >>>>   unbind
> > >>>>>>>>>      >>>>   >       will use
> > >>>>>>>>>      >>>>   >       the same queue and hence are ordered.
> > >>>>>>>>>      >>>>   >
> > >>>>>>>>>      >>>>   > >>
> > >>>>>>>>>      >>>>   > >> One thing I didn't realize is that
> > >>>>>>>>>because we only
> > >>>>>>>>>      get one
> > >>>>>>>>>      >>>>   > "VM_BIND" engine,
> > >>>>>>>>>      >>>>   > >> there is a disconnect from the Vulkan
> > >>>>>>>>>specification.
> > >>>>>>>>>      >>>>   > >>
> > >>>>>>>>>      >>>>   > >> In Vulkan VM_BIND operations are
> > >>>>>>>>>serialized but
> > >>>>>>>>>      >>>>per engine.
> > >>>>>>>>>      >>>>   > >>
> > >>>>>>>>>      >>>>   > >> So you could have something like this :
> > >>>>>>>>>      >>>>   > >>
> > >>>>>>>>>      >>>>   > >> VM_BIND (engine=rcs0, in_fence=fence1,
> > >>>>>>>>>      out_fence=fence2)
> > >>>>>>>>>      >>>>   > >>
> > >>>>>>>>>      >>>>   > >> VM_BIND (engine=ccs0, in_fence=fence3,
> > >>>>>>>>>      out_fence=fence4)
> > >>>>>>>>>      >>>>   > >>
> > >>>>>>>>>      >>>>   > >>
> > >>>>>>>>>      >>>>   > >> fence1 is not signaled
> > >>>>>>>>>      >>>>   > >>
> > >>>>>>>>>      >>>>   > >> fence3 is signaled
> > >>>>>>>>>      >>>>   > >>
> > >>>>>>>>>      >>>>   > >> So the second VM_BIND will proceed before the
> > >>>>>>>>>      >>>>first VM_BIND.
> > >>>>>>>>>      >>>>   > >>
> > >>>>>>>>>      >>>>   > >>
> > >>>>>>>>>      >>>>   > >> I guess we can deal with that scenario in
> > >>>>>>>>>      >>>>userspace by doing
> > >>>>>>>>>      >>>>   the
> > >>>>>>>>>      >>>>   >       wait
> > >>>>>>>>>      >>>>   > >> ourselves in one thread per engines.
> > >>>>>>>>>      >>>>   > >>
> > >>>>>>>>>      >>>>   > >> But then it makes the VM_BIND input
> > >>>>>>>>>fences useless.
> > >>>>>>>>>      >>>>   > >>
> > >>>>>>>>>      >>>>   > >>
> > >>>>>>>>>      >>>>   > >> Daniel : what do you think? Should be
> > >>>>>>>>>rework this or
> > >>>>>>>>>      just
> > >>>>>>>>>      >>>>   deal with
> > >>>>>>>>>      >>>>   >       wait
> > >>>>>>>>>      >>>>   > >> fences in userspace?
> > >>>>>>>>>      >>>>   > >>
> > >>>>>>>>>      >>>>   >       >
> > >>>>>>>>>      >>>>   >       >My opinion is rework this but make the
> > >>>>>>>>>ordering via
> > >>>>>>>>>      >>>>an engine
> > >>>>>>>>>      >>>>   param
> > >>>>>>>>>      >>>>   > optional.
> > >>>>>>>>>      >>>>   >       >
> > >>>>>>>>>      >>>>   > >e.g. A VM can be configured so all binds
> > >>>>>>>>>are ordered
> > >>>>>>>>>      >>>>within the
> > >>>>>>>>>      >>>>   VM
> > >>>>>>>>>      >>>>   >       >
> > >>>>>>>>>      >>>>   > >e.g. A VM can be configured so all binds
> > >>>>>>>>>accept an
> > >>>>>>>>>      engine
> > >>>>>>>>>      >>>>   argument
> > >>>>>>>>>      >>>>   >       (in
> > >>>>>>>>>      >>>>   > >the case of the i915 likely this is a
> > >>>>>>>>>gem context
> > >>>>>>>>>      >>>>handle) and
> > >>>>>>>>>      >>>>   binds
> > >>>>>>>>>      >>>>   > >ordered with respect to that engine.
> > >>>>>>>>>      >>>>   >       >
> > >>>>>>>>>      >>>>   > >This gives UMDs options as the later
> > >>>>>>>>>likely consumes
> > >>>>>>>>>      >>>>more KMD
> > >>>>>>>>>      >>>>   > resources
> > >>>>>>>>>      >>>>   >       >so if a different UMD can live with
> > >>>>binds being
> > >>>>>>>>>      >>>>ordered within
> > >>>>>>>>>      >>>>   the VM
> > >>>>>>>>>      >>>>   > >they can use a mode consuming less resources.
> > >>>>>>>>>      >>>>   >       >
> > >>>>>>>>>      >>>>   >
> > >>>>>>>>>      >>>>   >       I think we need to be careful here if we
> > >>>>>>>>>are looking
> > >>>>>>>>>      for some
> > >>>>>>>>>      >>>>   out of
> > >>>>>>>>>      >>>>   > (submission) order completion of vm_bind/unbind.
> > >>>>>>>>>      >>>>   > In-order completion means, in a batch of
> > >>>>>>>>>binds and
> > >>>>>>>>>      >>>>unbinds to be
> > >>>>>>>>>      >>>>   > completed in-order, user only needs to specify
> > >>>>>>>>>      >>>>in-fence for the
> > >>>>>>>>>      >>>>   >       first bind/unbind call and the our-fence
> > >>>>>>>>>for the last
> > >>>>>>>>>      >>>>   bind/unbind
> > >>>>>>>>>      >>>>   >       call. Also, the VA released by an unbind
> > >>>>>>>>>call can be
> > >>>>>>>>>      >>>>re-used by
> > >>>>>>>>>      >>>>   >       any subsequent bind call in that
> > >>>>in-order batch.
> > >>>>>>>>>      >>>>   >
> > >>>>>>>>>      >>>>   >       These things will break if
> > >>>>>>>>>binding/unbinding were to
> > >>>>>>>>>      >>>>be allowed
> > >>>>>>>>>      >>>>   to
> > >>>>>>>>>      >>>>   >       go out of order (of submission) and user
> > >>>>>>>>>need to be
> > >>>>>>>>>      extra
> > >>>>>>>>>      >>>>   careful
> > >>>>>>>>>      >>>>   >       not to run into pre-mature triggereing of
> > >>>>>>>>>out-fence and
> > >>>>>>>>>      bind
> > >>>>>>>>>      >>>>   failing
> > >>>>>>>>>      >>>>   >       as VA is still in use etc.
> > >>>>>>>>>      >>>>   >
> > >>>>>>>>>      >>>>   >       Also, VM_BIND binds the provided
> > >>>>mapping on the
> > >>>>>>>>>      specified
> > >>>>>>>>>      >>>>   address
> > >>>>>>>>>      >>>>   >       space
> > >>>>>>>>>      >>>>   >       (VM). So, the uapi is not engine/context
> > >>>>>>>>>specific.
> > >>>>>>>>>      >>>>   >
> > >>>>>>>>>      >>>>   >       We can however add a 'queue' to the uapi
> > >>>>>>>>>which can be
> > >>>>>>>>>      >>>>one from
> > >>>>>>>>>      >>>>   the
> > >>>>>>>>>      >>>>   > pre-defined queues,
> > >>>>>>>>>      >>>>   > I915_VM_BIND_QUEUE_0
> > >>>>>>>>>      >>>>   > I915_VM_BIND_QUEUE_1
> > >>>>>>>>>      >>>>   >       ...
> > >>>>>>>>>      >>>>   > I915_VM_BIND_QUEUE_(N-1)
> > >>>>>>>>>      >>>>   >
> > >>>>>>>>>      >>>>   >       KMD will spawn an async work queue for
> > >>>>>>>>>each queue which
> > >>>>>>>>>      will
> > >>>>>>>>>      >>>>   only
> > >>>>>>>>>      >>>>   >       bind the mappings on that queue in the
> > >>>>order of
> > >>>>>>>>>      submission.
> > >>>>>>>>>      >>>>   >       User can assign the queue to per engine
> > >>>>>>>>>or anything
> > >>>>>>>>>      >>>>like that.
> > >>>>>>>>>      >>>>   >
> > >>>>>>>>>      >>>>   >       But again here, user need to be
> > >>>>careful and not
> > >>>>>>>>>      >>>>deadlock these
> > >>>>>>>>>      >>>>   >       queues with circular dependency of fences.
> > >>>>>>>>>      >>>>   >
> > >>>>>>>>>      >>>>   >       I prefer adding this later an as
> > >>>>>>>>>extension based on
> > >>>>>>>>>      >>>>whether it
> > >>>>>>>>>      >>>>   >       is really helping with the implementation.
> > >>>>>>>>>      >>>>   >
> > >>>>>>>>>      >>>>   >     I can tell you right now that having
> > >>>>>>>>>everything on a
> > >>>>>>>>>      single
> > >>>>>>>>>      >>>>   in-order
> > >>>>>>>>>      >>>>   >     queue will not get us the perf we want.
> > >>>>>>>>>What vulkan
> > >>>>>>>>>      >>>>really wants
> > >>>>>>>>>      >>>>   is one
> > >>>>>>>>>      >>>>   >     of two things:
> > >>>>>>>>>      >>>>   >      1. No implicit ordering of VM_BIND
> > >>>>ops.  They just
> > >>>>>>>>>      happen in
> > >>>>>>>>>      >>>>   whatever
> > >>>>>>>>>      >>>>   >     their dependencies are resolved and we
> > >>>>>>>>>ensure ordering
> > >>>>>>>>>      >>>>ourselves
> > >>>>>>>>>      >>>>   by
> > >>>>>>>>>      >>>>   >     having a syncobj in the VkQueue.
> > >>>>>>>>>      >>>>   >      2. The ability to create multiple VM_BIND
> > >>>>>>>>>queues.  We
> > >>>>>>>>>      need at
> > >>>>>>>>>      >>>>   least 2
> > >>>>>>>>>      >>>>   >     but I don't see why there needs to be a
> > >>>>>>>>>limit besides
> > >>>>>>>>>      >>>>the limits
> > >>>>>>>>>      >>>>   the
> > >>>>>>>>>      >>>>   >     i915 API already has on the number of
> > >>>>>>>>>engines.  Vulkan
> > >>>>>>>>>      could
> > >>>>>>>>>      >>>>   expose
> > >>>>>>>>>      >>>>   >     multiple sparse binding queues to the
> > >>>>>>>>>client if it's not
> > >>>>>>>>>      >>>>   arbitrarily
> > >>>>>>>>>      >>>>   >     limited.
> > >>>>>>>>>      >>>>
> > >>>>>>>>>      >>>>   Thanks Jason, Lionel.
> > >>>>>>>>>      >>>>
> > >>>>>>>>>      >>>>   Jason, what are you referring to when you say
> > >>>>>>>>>"limits the i915
> > >>>>>>>>>      API
> > >>>>>>>>>      >>>>   already
> > >>>>>>>>>      >>>>   has on the number of engines"? I am not sure if
> > >>>>>>>>>there is such
> > >>>>>>>>>      an uapi
> > >>>>>>>>>      >>>>   today.
> > >>>>>>>>>      >>>>
> > >>>>>>>>>      >>>> There's a limit of something like 64 total engines
> > >>>>>>>>>today based on
> > >>>>>>>>>      the
> > >>>>>>>>>      >>>> number of bits we can cram into the exec flags in
> > >>>>>>>>>execbuffer2.  I
> > >>>>>>>>>      think
> > >>>>>>>>>      >>>> someone had an extended version that allowed more
> > >>>>>>>>>but I ripped it
> > >>>>>>>>>      out
> > >>>>>>>>>      >>>> because no one was using it.  Of course,
> > >>>>>>>>>execbuffer3 might not
> > >>>>>>>>>      >>>>have that
> > >>>>>>>>>      >>>> problem at all.
> > >>>>>>>>>      >>>>
> > >>>>>>>>>      >>>
> > >>>>>>>>>      >>>Thanks Jason.
> > >>>>>>>>>      >>>Ok, I am not sure which exec flag is that, but yah,
> > >>>>>>>>>execbuffer3
> > >>>>>>>>>      probably
> > >>>>>>>>>      >>>will not have this limiation. So, we need to define a
> > >>>>>>>>>      VM_BIND_MAX_QUEUE
> > >>>>>>>>>      >>>and somehow export it to user (I am thinking of
> > >>>>>>>>>embedding it in
> > >>>>>>>>>      >>>I915_PARAM_HAS_VM_BIND. bits[0]->HAS_VM_BIND,
> > >>>>bits[1-3]->'n'
> > >>>>>>>>>      meaning 2^n
> > >>>>>>>>>      >>>queues.
> > >>>>>>>>>      >>
> > >>>>>>>>>      >>Ah, I think you are waking about I915_EXEC_RING_MASK
> > >>>>>>>>>(0x3f) which
> > >>>>>>>>>      execbuf3
> > >>>>>>>>>
> > >>>>>>>>>    Yup!  That's exactly the limit I was talking about.
> > >>>>>>>>>
> > >>>>>>>>>      >>will also have. So, we can simply define in
> vm_bind/unbind
> > >>>>>>>>>      structures,
> > >>>>>>>>>      >>
> > >>>>>>>>>      >>#define I915_VM_BIND_MAX_QUEUE   64
> > >>>>>>>>>      >>        __u32 queue;
> > >>>>>>>>>      >>
> > >>>>>>>>>      >>I think that will keep things simple.
> > >>>>>>>>>      >
> > >>>>>>>>>      >Hmmm? What does execbuf2 limit has to do with how
> > >>>>many engines
> > >>>>>>>>>      >hardware can have? I suggest not to do that.
> > >>>>>>>>>      >
> > >>>>>>>>>      >Change with added this:
> > >>>>>>>>>      >
> > >>>>>>>>>      >       if (set.num_engines > I915_EXEC_RING_MASK + 1)
> > >>>>>>>>>      >               return -EINVAL;
> > >>>>>>>>>      >
> > >>>>>>>>>      >To context creation needs to be undone and so let users
> > >>>>>>>>>create engine
> > >>>>>>>>>      >maps with all hardware engines, and let execbuf3 access
> > >>>>>>>>>them all.
> > >>>>>>>>>      >
> > >>>>>>>>>
> > >>>>>>>>>      Earlier plan was to carry I915_EXEC_RING_MAP (0x3f) to
> > >>>>>>>>>execbuff3 also.
> > >>>>>>>>>      Hence, I was using the same limit for VM_BIND queues
> > >>>>>>>>>(64, or 65 if we
> > >>>>>>>>>      make it N+1).
> > >>>>>>>>>      But, as discussed in other thread of this RFC series, we
> > >>>>>>>>>are planning
> > >>>>>>>>>      to drop this I915_EXEC_RING_MAP in execbuff3. So,
> > >>>>there won't be
> > >>>>>>>>>      any uapi that limits the number of engines (and hence
> > >>>>>>>>>the vm_bind
> > >>>>>>>>>      queues
> > >>>>>>>>>      need to be supported).
> > >>>>>>>>>
> > >>>>>>>>>      If we leave the number of vm_bind queues to be
> > >>>>arbitrarily large
> > >>>>>>>>>      (__u32 queue_idx) then, we need to have a hashmap for
> > >>>>>>>>>queue (a wq,
> > >>>>>>>>>      work_item and a linked list) lookup from the user
> > >>>>>>>>>specified queue
> > >>>>>>>>>      index.
> > >>>>>>>>>      Other option is to just put some hard limit (say 64 or
> > >>>>>>>>>65) and use
> > >>>>>>>>>      an array of queues in VM (each created upon first use).
> > >>>>>>>>>I prefer this.
> > >>>>>>>>>
> > >>>>>>>>>    I don't get why a VM_BIND queue is any different from any
> > >>>>>>>>>other queue or
> > >>>>>>>>>    userspace-visible kernel object.  But I'll leave those
> > >>>>>>>>>details up to
> > >>>>>>>>>    danvet or whoever else might be reviewing the
> > implementation.
> > >>>>>>>>>    --Jason
> > >>>>>>>>>
> > >>>>>>>>>  I kind of agree here. Wouldn't be simpler to have the bind
> > >>>>>>>>>queue created
> > >>>>>>>>>  like the others when we build the engine map?
> > >>>>>>>>>
> > >>>>>>>>>  For userspace it's then just matter of selecting the right
> > >>>>>>>>>queue ID when
> > >>>>>>>>>  submitting.
> > >>>>>>>>>
> > >>>>>>>>>  If there is ever a possibility to have this work on the GPU,
> > >>>>>>>>>it would be
> > >>>>>>>>>  all ready.
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>>>I did sync offline with Matt Brost on this.
> > >>>>>>>>We can add a VM_BIND engine class and let user create
> VM_BIND
> > >>>>>>>>engines (queues).
> > >>>>>>>>The problem is, in i915 engine creating interface is bound to
> > >>>>>>>>gem_context.
> > >>>>>>>>So, in vm_bind ioctl, we would need both context_id and
> > >>>>>>>>queue_idx for proper
> > >>>>>>>>lookup of the user created engine. This is bit ackward as
> > >>>>vm_bind is an
> > >>>>>>>>interface to VM (address space) and has nothing to do with
> > >>>>gem_context.
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>A gem_context has a single vm object right?
> > >>>>>>>
> > >>>>>>>Set through I915_CONTEXT_PARAM_VM at creation or given a
> > default
> > >>>>>>>one if not.
> > >>>>>>>
> > >>>>>>>So it's just like picking up the vm like it's done at execbuffer
> > >>>>>>>time right now : eb->context->vm
> > >>>>>>>
> > >>>>>>
> > >>>>>>Are you suggesting replacing 'vm_id' with 'context_id' in the
> > >>>>>>VM_BIND/UNBIND
> > >>>>>>ioctl and probably call it CONTEXT_BIND/UNBIND, because VM can
> > be
> > >>>>>>obtained
> > >>>>>>from the context?
> > >>>>>
> > >>>>>
> > >>>>>Yes, because if we go for engines, they're associated with a context
> > >>>>>and so also associated with the VM bound to the context.
> > >>>>>
> > >>>>
> > >>>>Hmm...context doesn't sould like the right interface. It should be
> > >>>>VM and engine (independent of context). Engine can be virtual or soft
> > >>>>engine (kernel thread), each with its own queue. We can add an
> > >>>>interface
> > >>>>to create such engines (independent of context). But we are anway
> > >>>>implicitly creating it when user uses a new queue_idx. If in future
> > >>>>we have hardware engines for VM_BIND operation, we can have that
> > >>>>explicit inteface to create engine instances and the queue_index
> > >>>>in vm_bind/unbind will point to those engines.
> > >>>>Anyone has any thoughts? Daniel?
> > >>>
> > >>>Exposing gem_context or intel_context to user space is a strange
> > >>>concept to me. A context represent some hw resources that is used
> > >>>to complete certain task. User space should care allocate some
> > >>>resources (memory, queues) and submit tasks to queues. But user
> > >>>space doesn't care how certain task is mapped to a HW context -
> > >>>driver/guc should take care of this.
> > >>>
> > >>>So a cleaner interface to me is: user space create a vm,  create
> > >>>gem object, vm_bind it to a vm; allocate queues (internally
> > >>>represent compute or blitter HW. Queue can be virtual to user) for
> > >>>this vm; submit tasks to queues. User can create multiple queues
> > >>>under one vm. One queue is only for one vm.
> > >>>
> > >>>I915 driver/guc manage the hw compute or blitter resources which
> > >>>is transparent to user space. When i915 or guc decide to schedule
> > >>>a queue (run tasks on that queue), a HW engine will be pick up and
> > >>>set up properly for the vm of that queue (ie., switch to page
> > >>>tables of that vm) - this is a context switch.
> > >>>
> > >>>From vm_bind perspective, it simply bind a gem_object to a vm.
> > >>>Engine/queue is not a parameter to vm_bind, as any engine can be
> > >>>pick up by i915/guc to execute a task using the vm bound va.
> > >>>
> > >>>I didn't completely follow the discussion here. Just share some
> > >>>thoughts.
> > >>>
> > >>
> > >>Yah, I agree.
> > >>
> > >>Lionel,
> > >>How about we define the queue as
> > >>union {
> > >>       __u32 queue_idx;
> > >>       __u64 rsvd;
> > >>}
> > >>
> > >>If required, we can extend by expanding the 'rsvd' field to <ctx_id,
> > >>queue_idx> later
> > >>with a flag.
> > >>
> > >>Niranjana
> > >
> > >
> > >I did not really understand Oak's comment nor what you're suggesting
> > >here to be honest.
> > >
> > >
> > >First the GEM context is already exposed to userspace. It's explicitly
> > >created by userpace with DRM_IOCTL_I915_GEM_CONTEXT_CREATE.
> > >
> > >We give the GEM context id in every execbuffer we do with
> > >drm_i915_gem_execbuffer2::rsvd1.
> > >
> > >It's still in the new execbuffer3 proposal being discussed.
> > >
> > >
> > >Second, the GEM context is also where we set the VM with
> > >I915_CONTEXT_PARAM_VM.
> > >
> > >
> > >Third, the GEM context also has the list of engines with
> > >I915_CONTEXT_PARAM_ENGINES.
> > >
> >
> > Yes, the execbuf and engine map creation are tied to gem_context.
> > (which probably is not the best interface.)
> >
> > >
> > >So it makes sense to me to dispatch the vm_bind operation to a GEM
> > >context, to a given vm_bind queue, because it's got all the
> > >information required :
> > >
> > >    - the list of new vm_bind queues
> > >
> > >    - the vm that is going to be modified
> > >
> >
> > But the operation is performed here on the address space (VM) which
> > can have multiple gem_contexts referring to it. So, VM is the right
> > interface here. We need not 'gem_context'ify it.
> >
> > All we need is multiple queue support for the address space (VM).
> > Going to gem_context for that just because we have engine creation
> > support there seems unnecessay and not correct to me.
> >
> > >
> > >Otherwise where do the vm_bind queues live?
> > >
> > >In the i915/drm fd object?
> > >
> > >That would mean that all the GEM contexts are sharing the same vm_bind
> > >queues.
> > >
> >
> > Not all, only the gem contexts that are using the same address space (VM).
> > But to me the right way to describe would be that "VM will be using those
> > queues".
> 
> 
> I hope by "queue" here you mean a HW resource  that will be later used to
> execute the job, for example a ccs compute engine. Of course queue can be
> virtual so user can create more queues than what hw physically has.
> 
> To express the concept of "VM will be using those queues", I think it make
> sense to have create_queue(vm) function taking a vm parameter. This
> means this queue is created for the purpose of submit job under this VM.
> Later on, we can submit job (referring to objects vm_bound to the same vm)
> to the queue. The vm_bind ioctl doesn’t need to have queue parameter, just
> vm_bind (object, va, vm).
> 
> I hope the "queue" here is not the engine used to perform the vm_bind
> operation itself. But if you meant a queue/engine to perform vm_bind itself
> (vs a queue/engine for later job submission), then we can discuss more. I
> know xe driver have similar concept and I think align the design early can
> benefit the migration to xe driver.

Oops, I read more on this thread and it turned out the vm_bind queue here is actually used to perform vm bind/unbind operations. XE driver has the similar concept (except it is called engine_id there). So having a queue_idx parameter is closer to xe design.

That said, I still feel having a queue_idx parameter to vm_bind is a bit awkward. Vm_bind can be performed without any GPU engines, ie,. CPU itself can complete a vm bind as long as CPU have access to gpu's local memory. So the queue here have to be a virtual concept - it doesn't have a hard map to GPU blitter engine.

Can someone summarize what is the benefit of the queue-idx parameter? For the purpose of ordering vm_bind and later gpu jobs?  

> 
> Regards,
> Oak
> 
> >
> > Niranjana
> >
> > >
> > >intel_context or GuC are internal details we're not concerned about.
> > >
> > >I don't really see the connection with the GEM context.
> > >
> > >
> > >Maybe Oak has a different use case than Vulkan.
> > >
> > >
> > >-Lionel
> > >
> > >
> > >>
> > >>>Regards,
> > >>>Oak
> > >>>
> > >>>>
> > >>>>Niranjana
> > >>>>
> > >>>>>
> > >>>>>>I think the interface is clean as a interface to VM. It is
> > >>>>only that we
> > >>>>>>don't have a clean way to create a raw VM_BIND engine (not
> > >>>>>>associated with
> > >>>>>>any context) with i915 uapi.
> > >>>>>>May be we can add such an interface, but I don't think that is
> > >>>>worth it
> > >>>>>>(we might as well just use a queue_idx in VM_BIND/UNBIND ioctl
> as I
> > >>>>>>mentioned
> > >>>>>>above).
> > >>>>>>Anyone has any thoughts?
> > >>>>>>
> > >>>>>>>
> > >>>>>>>>Another problem is, if two VMs are binding with the same
> defined
> > >>>>>>>>engine,
> > >>>>>>>>binding on VM1 can get unnecessary blocked by binding on VM2
> > >>>>>>>>(which may be
> > >>>>>>>>waiting on its in_fence).
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>Maybe I'm missing something, but how can you have 2 vm objects
> > >>>>>>>with a single gem_context right now?
> > >>>>>>>
> > >>>>>>
> > >>>>>>No, we don't have 2 VMs for a gem_context.
> > >>>>>>Say if ctx1 with vm1 and ctx2 with vm2.
> > >>>>>>First vm_bind call was for vm1 with q_idx 1 in ctx1 engine map.
> > >>>>>>Second vm_bind call was for vm2 with q_idx 2 in ctx2 engine map. If
> > >>>>>>those two queue indicies points to same underlying vm_bind
> engine,
> > >>>>>>then the second vm_bind call gets blocked until the first
> > >>>>vm_bind call's
> > >>>>>>'in' fence is triggered and bind completes.
> > >>>>>>
> > >>>>>>With per VM queues, this is not a problem as two VMs will not
> endup
> > >>>>>>sharing same queue.
> > >>>>>>
> > >>>>>>BTW, I just posted a updated PATCH series.
> > >>>>>>https://www.spinics.net/lists/dri-devel/msg350483.html
> > >>>>>>
> > >>>>>>Niranjana
> > >>>>>>
> > >>>>>>>
> > >>>>>>>>
> > >>>>>>>>So, my preference here is to just add a 'u32 queue' index in
> > >>>>>>>>vm_bind/unbind
> > >>>>>>>>ioctl, and the queues are per VM.
> > >>>>>>>>
> > >>>>>>>>Niranjana
> > >>>>>>>>
> > >>>>>>>>>  Thanks,
> > >>>>>>>>>
> > >>>>>>>>>  -Lionel
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>      Niranjana
> > >>>>>>>>>
> > >>>>>>>>>      >Regards,
> > >>>>>>>>>      >
> > >>>>>>>>>      >Tvrtko
> > >>>>>>>>>      >
> > >>>>>>>>>      >>
> > >>>>>>>>>      >>Niranjana
> > >>>>>>>>>      >>
> > >>>>>>>>>      >>>
> > >>>>>>>>>      >>>>   I am trying to see how many queues we need and
> > >>>>>>>>>don't want it to
> > >>>>>>>>>      be
> > >>>>>>>>>      >>>>   arbitrarily
> > >>>>>>>>>      >>>>   large and unduely blow up memory usage and
> > >>>>>>>>>complexity in i915
> > >>>>>>>>>      driver.
> > >>>>>>>>>      >>>>
> > >>>>>>>>>      >>>> I expect a Vulkan driver to use at most 2 in the
> > >>>>>>>>>vast majority
> > >>>>>>>>>      >>>>of cases. I
> > >>>>>>>>>      >>>> could imagine a client wanting to create more
> > >>>>than 1 sparse
> > >>>>>>>>>      >>>>queue in which
> > >>>>>>>>>      >>>> case, it'll be N+1 but that's unlikely. As far as
> > >>>>>>>>>complexity
> > >>>>>>>>>      >>>>goes, once
> > >>>>>>>>>      >>>> you allow two, I don't think the complexity is
> > >>>>going up by
> > >>>>>>>>>      >>>>allowing N.  As
> > >>>>>>>>>      >>>> for memory usage, creating more queues means more
> > >>>>>>>>>memory.  That's
> > >>>>>>>>>      a
> > >>>>>>>>>      >>>> trade-off that userspace can make. Again, the
> > >>>>>>>>>expected number
> > >>>>>>>>>      >>>>here is 1
> > >>>>>>>>>      >>>> or 2 in the vast majority of cases so I don't think
> > >>>>>>>>>you need to
> > >>>>>>>>>      worry.
> > >>>>>>>>>      >>>
> > >>>>>>>>>      >>>Ok, will start with n=3 meaning 8 queues.
> > >>>>>>>>>      >>>That would require us create 8 workqueues.
> > >>>>>>>>>      >>>We can change 'n' later if required.
> > >>>>>>>>>      >>>
> > >>>>>>>>>      >>>Niranjana
> > >>>>>>>>>      >>>
> > >>>>>>>>>      >>>>
> > >>>>>>>>>      >>>>   >     Why? Because Vulkan has two basic kind of bind
> > >>>>>>>>>      >>>>operations and we
> > >>>>>>>>>      >>>>   don't
> > >>>>>>>>>      >>>>   >     want any dependencies between them:
> > >>>>>>>>>      >>>>   >      1. Immediate.  These happen right after BO
> > >>>>>>>>>creation or
> > >>>>>>>>>      >>>>maybe as
> > >>>>>>>>>      >>>>   part of
> > >>>>>>>>>      >>>>   > vkBindImageMemory() or
> > VkBindBufferMemory().  These
> > >>>>>>>>>      >>>>don't happen
> > >>>>>>>>>      >>>>   on a
> > >>>>>>>>>      >>>>   >     queue and we don't want them serialized
> > >>>>>>>>>with anything.       To
> > >>>>>>>>>      >>>>   synchronize
> > >>>>>>>>>      >>>>   >     with submit, we'll have a syncobj in the
> > >>>>>>>>>VkDevice which
> > >>>>>>>>>      is
> > >>>>>>>>>      >>>>   signaled by
> > >>>>>>>>>      >>>>   >     all immediate bind operations and make
> > >>>>>>>>>submits wait on
> > >>>>>>>>>      it.
> > >>>>>>>>>      >>>>   >      2. Queued (sparse): These happen on a
> > >>>>>>>>>VkQueue which may
> > >>>>>>>>>      be the
> > >>>>>>>>>      >>>>   same as
> > >>>>>>>>>      >>>>   >     a render/compute queue or may be its own
> > >>>>>>>>>queue.  It's up
> > >>>>>>>>>      to us
> > >>>>>>>>>      >>>>   what we
> > >>>>>>>>>      >>>>   >     want to advertise.  From the Vulkan API
> > >>>>>>>>>PoV, this is like
> > >>>>>>>>>      any
> > >>>>>>>>>      >>>>   other
> > >>>>>>>>>      >>>>   >     queue. Operations on it wait on and signal
> > >>>>>>>>>semaphores.       If we
> > >>>>>>>>>      >>>>   have a
> > >>>>>>>>>      >>>>   >     VM_BIND engine, we'd provide syncobjs to
> > >>>>wait and
> > >>>>>>>>>      >>>>signal just like
> > >>>>>>>>>      >>>>   we do
> > >>>>>>>>>      >>>>   >     in execbuf().
> > >>>>>>>>>      >>>>   >     The important thing is that we don't want
> > >>>>>>>>>one type of
> > >>>>>>>>>      >>>>operation to
> > >>>>>>>>>      >>>>   block
> > >>>>>>>>>      >>>>   >     on the other.  If immediate binds are
> > >>>>>>>>>blocking on sparse
> > >>>>>>>>>      binds,
> > >>>>>>>>>      >>>>   it's
> > >>>>>>>>>      >>>>   >     going to cause over-synchronization issues.
> > >>>>>>>>>      >>>>   >     In terms of the internal implementation, I
> > >>>>>>>>>know that
> > >>>>>>>>>      >>>>there's going
> > >>>>>>>>>      >>>>   to be
> > >>>>>>>>>      >>>>   >     a lock on the VM and that we can't actually
> > >>>>>>>>>do these
> > >>>>>>>>>      things in
> > >>>>>>>>>      >>>>   > parallel.  That's fine. Once the dma_fences have
> > >>>>>>>>>      signaled and
> > >>>>>>>>>      >>>>   we're
> > >>>>>>>>>      >>>>
> > >>>>>>>>>      >>>>   Thats correct. It is like a single VM_BIND
> > >>>>engine with
> > >>>>>>>>>      >>>>multiple queues
> > >>>>>>>>>      >>>>   feeding to it.
> > >>>>>>>>>      >>>>
> > >>>>>>>>>      >>>> Right.  As long as the queues themselves are
> > >>>>>>>>>independent and
> > >>>>>>>>>      >>>>can block on
> > >>>>>>>>>      >>>> dma_fences without holding up other queues, I think
> > >>>>>>>>>we're fine.
> > >>>>>>>>>      >>>>
> > >>>>>>>>>      >>>>   > unblocked to do the bind operation, I don't care if
> > >>>>>>>>>      >>>>there's a bit
> > >>>>>>>>>      >>>>   of
> > >>>>>>>>>      >>>>   > synchronization due to locking.  That's
> > >>>>>>>>>expected.  What
> > >>>>>>>>>      >>>>we can't
> > >>>>>>>>>      >>>>   afford
> > >>>>>>>>>      >>>>   >     to have is an immediate bind operation
> > >>>>>>>>>suddenly blocking
> > >>>>>>>>>      on a
> > >>>>>>>>>      >>>>   sparse
> > >>>>>>>>>      >>>>   > operation which is blocked on a compute job
> > >>>>>>>>>that's going
> > >>>>>>>>>      to run
> > >>>>>>>>>      >>>>   for
> > >>>>>>>>>      >>>>   >     another 5ms.
> > >>>>>>>>>      >>>>
> > >>>>>>>>>      >>>>   As the VM_BIND queue is per VM, VM_BIND on one
> > VM
> > >>>>>>>>>doesn't block
> > >>>>>>>>>      the
> > >>>>>>>>>      >>>>   VM_BIND
> > >>>>>>>>>      >>>>   on other VMs. I am not sure about usecases
> > >>>>here, but just
> > >>>>>>>>>      wanted to
> > >>>>>>>>>      >>>>   clarify.
> > >>>>>>>>>      >>>>
> > >>>>>>>>>      >>>> Yes, that's what I would expect.
> > >>>>>>>>>      >>>> --Jason
> > >>>>>>>>>      >>>>
> > >>>>>>>>>      >>>>   Niranjana
> > >>>>>>>>>      >>>>
> > >>>>>>>>>      >>>>   >     For reference, Windows solves this by allowing
> > >>>>>>>>>      arbitrarily many
> > >>>>>>>>>      >>>>   paging
> > >>>>>>>>>      >>>>   >     queues (what they call a VM_BIND
> > >>>>>>>>>engine/queue).  That
> > >>>>>>>>>      >>>>design works
> > >>>>>>>>>      >>>>   >     pretty well and solves the problems in
> > >>>>>>>>>question.       >>>>Again, we could
> > >>>>>>>>>      >>>>   just
> > >>>>>>>>>      >>>>   >     make everything out-of-order and require
> > >>>>>>>>>using syncobjs
> > >>>>>>>>>      >>>>to order
> > >>>>>>>>>      >>>>   things
> > >>>>>>>>>      >>>>   >     as userspace wants. That'd be fine too.
> > >>>>>>>>>      >>>>   >     One more note while I'm here: danvet said
> > >>>>>>>>>something on
> > >>>>>>>>>      >>>>IRC about
> > >>>>>>>>>      >>>>   VM_BIND
> > >>>>>>>>>      >>>>   >     queues waiting for syncobjs to
> > >>>>>>>>>materialize.  We don't
> > >>>>>>>>>      really
> > >>>>>>>>>      >>>>   want/need
> > >>>>>>>>>      >>>>   >     this. We already have all the machinery in
> > >>>>>>>>>userspace to
> > >>>>>>>>>      handle
> > >>>>>>>>>      >>>>   > wait-before-signal and waiting for syncobj
> > >>>>>>>>>fences to
> > >>>>>>>>>      >>>>materialize
> > >>>>>>>>>      >>>>   and
> > >>>>>>>>>      >>>>   >     that machinery is on by default.  It
> > >>>>would actually
> > >>>>>>>>>      >>>>take MORE work
> > >>>>>>>>>      >>>>   in
> > >>>>>>>>>      >>>>   >     Mesa to turn it off and take advantage of
> > >>>>>>>>>the kernel
> > >>>>>>>>>      >>>>being able to
> > >>>>>>>>>      >>>>   wait
> > >>>>>>>>>      >>>>   >     for syncobjs to materialize. Also, getting
> > >>>>>>>>>that right is
> > >>>>>>>>>      >>>>   ridiculously
> > >>>>>>>>>      >>>>   >     hard and I really don't want to get it
> > >>>>>>>>>wrong in kernel
> > >>>>>>>>>      >>>>space.   �� When we
> > >>>>>>>>>      >>>>   >     do memory fences, wait-before-signal will
> > >>>>>>>>>be a thing.  We
> > >>>>>>>>>      don't
> > >>>>>>>>>      >>>>   need to
> > >>>>>>>>>      >>>>   >     try and make it a thing for syncobj.
> > >>>>>>>>>      >>>>   >     --Jason
> > >>>>>>>>>      >>>>   >
> > >>>>>>>>>      >>>>   >   Thanks Jason,
> > >>>>>>>>>      >>>>   >
> > >>>>>>>>>      >>>>   >   I missed the bit in the Vulkan spec that
> > >>>>>>>>>we're allowed to
> > >>>>>>>>>      have a
> > >>>>>>>>>      >>>>   sparse
> > >>>>>>>>>      >>>>   >   queue that does not implement either graphics
> > >>>>>>>>>or compute
> > >>>>>>>>>      >>>>operations
> > >>>>>>>>>      >>>>   :
> > >>>>>>>>>      >>>>   >
> > >>>>>>>>>      >>>>   >     "While some implementations may include
> > >>>>>>>>>      >>>> VK_QUEUE_SPARSE_BINDING_BIT
> > >>>>>>>>>      >>>>   >     support in queue families that also include
> > >>>>>>>>>      >>>>   >
> > >>>>>>>>>      >>>>   > graphics and compute support, other
> > >>>>>>>>>implementations may
> > >>>>>>>>>      only
> > >>>>>>>>>      >>>>   expose a
> > >>>>>>>>>      >>>>   > VK_QUEUE_SPARSE_BINDING_BIT-only queue
> > >>>>>>>>>      >>>>   >
> > >>>>>>>>>      >>>>   > family."
> > >>>>>>>>>      >>>>   >
> > >>>>>>>>>      >>>>   >   So it can all be all a vm_bind engine that
> > >>>>just does
> > >>>>>>>>>      bind/unbind
> > >>>>>>>>>      >>>>   > operations.
> > >>>>>>>>>      >>>>   >
> > >>>>>>>>>      >>>>   >   But yes we need another engine for the
> > >>>>>>>>>immediate/non-sparse
> > >>>>>>>>>      >>>>   operations.
> > >>>>>>>>>      >>>>   >
> > >>>>>>>>>      >>>>   >   -Lionel
> > >>>>>>>>>      >>>>   >
> > >>>>>>>>>      >>>>   >         >
> > >>>>>>>>>      >>>>   > Daniel, any thoughts?
> > >>>>>>>>>      >>>>   >
> > >>>>>>>>>      >>>>   > Niranjana
> > >>>>>>>>>      >>>>   >
> > >>>>>>>>>      >>>>   > >Matt
> > >>>>>>>>>      >>>>   >       >
> > >>>>>>>>>      >>>>   > >>
> > >>>>>>>>>      >>>>   > >> Sorry I noticed this late.
> > >>>>>>>>>      >>>>   > >>
> > >>>>>>>>>      >>>>   > >>
> > >>>>>>>>>      >>>>   > >> -Lionel
> > >>>>>>>>>      >>>>   > >>
> > >>>>>>>>>      >>>>   > >>
> > >>>>>>>
> > >>>>>>>
> > >>>>>
> > >
> > >