Re: [PATCH v4 1/7] accel/ivpu: Introduce a new DRM driver for Intel VPU

Oded Gabbay <oded.gabbay@xxxxxxxxx> · Thu, 5 Jan 2023 19:38:26 +0200

On Thu, Jan 5, 2023 at 6:25 PM Jeffrey Hugo <quic_jhugo@xxxxxxxxxxx> wrote:
>
> On 1/5/2023 5:57 AM, Daniel Vetter wrote:
> > On Thu, Dec 08, 2022 at 12:07:27PM +0100, Jacek Lawrynowicz wrote:
> >> +static const struct drm_driver driver = {
> >> +    .driver_features = DRIVER_GEM | DRIVER_COMPUTE_ACCEL,
> >
> > So I was wondering whether this is a bright idea, and whether we shouldn't
> > just go ahead and infuse more meaning into accel vs render nodes.
> >
> > The uapi relevant part of render nodes is that they're multi-user safe, at
> > least as much as feasible. Every new open() gives you a new private
> > accelerator. This also has implications on how userspace drivers iterate
> > them, they just open them all in turn and check whether it's the right
> > one - because userspace apis allow applications to enumerate them all.
> > Which also means that any devicie initialization at open() time is a
> > really bad idea.
> >
> > A lot of the compute accelerators otoh (well habanalabs) are single user,
> > init can be done at open() time because you only open this when you
> > actually know you're going to use it.
> >
> > So given this, shouldn't multi-user inference engines be more like render
> > drivers, and less like accel? So DRIVER_RENDER, but still under
> > drivers/accel.
> >
> > This way that entire separate /dev node would actually become meaningful
> > beyond just the basic bikeshed:
> > - render nodes are multi user, safe to iterate and open() just for
> >    iteration
> > - accel nodes are single user, you really should not ever open them unless
> >    you want to use them
> >
> > Of course would need a doc patch :-)
> >
> > Thoughts?
> > -Daniel
>
> Hmm.
>
> I admit, I thought DRIVER_ACCEL was the same as DRIVER_RENDER, except
> that DRIVER_ACCEL dropped the "legacy" dual node setup and also avoided
> "legacy" userspace.
>
> qaic is multi-user.  I thought habana was the same, at-least for
> inference.  Oded, am I wrong?
Habana's devices support a single user at a time acquiring the device
and working on it.
Both for training and inference.
>
> So, if DRIVER_ACCEL is for single-user (training?), and multi-user ends
> up in DRIVER_RENDER, that would seem to mean qaic ends up using
> DRIVER_RENDER and not DRIVER_ACCEL.  Then qaic ends up over under
> /dev/dri with both a card node (never used) and a render node.  That
> would seem to mean that the "legacy" userspace would open qaic nodes by
> default - something I understood Oded was trying to avoid.
>
> If there really a usecase for DRIVER_ACCEL to support single-user?  I
> wonder why we can't default to multi-user, and if a particular
> user/driver has a single-user usecase, it enforces that in a driver
> specific manner?
>
> -Jeff

Honestly, Daniel, I don't like this suggestion. I don't understand why
you make a connection between render/accel to single/multi user.

As Jeff has said, one of the goals was to expose accelerator devices
to userspace with new device char nodes so we won't be bothered by
legacy userspace graphics software. This is something we all agreed on
and I don't see why we should change it now, even if you think it's
bike-shedding (which I disagree with).

But in any case, creating a new device char nodes had nothing to do
with whether the device supports single or multi user. I can
definitely see in the future training devices that support multiple
users.

The common drm/accel ioctls should of course not be limited to a
single user, and I agree with Jeff here, if a specific driver has such
a limitation (e.g. Habana), then that driver should handle it on its
own.
Maybe if there will be multiple drivers with such a limitation, we can
make that "handling" to be common code.

Bottom line, I prefer we keep things as we all agreed upon in LPC.

Oded