On 1/5/2023 5:57 AM, Daniel Vetter wrote:
On Thu, Dec 08, 2022 at 12:07:27PM +0100, Jacek Lawrynowicz wrote:
+static const struct drm_driver driver = {
+ .driver_features = DRIVER_GEM | DRIVER_COMPUTE_ACCEL,
So I was wondering whether this is a bright idea, and whether we shouldn't
just go ahead and infuse more meaning into accel vs render nodes.
The uapi relevant part of render nodes is that they're multi-user safe, at
least as much as feasible. Every new open() gives you a new private
accelerator. This also has implications on how userspace drivers iterate
them, they just open them all in turn and check whether it's the right
one - because userspace apis allow applications to enumerate them all.
Which also means that any devicie initialization at open() time is a
really bad idea.
A lot of the compute accelerators otoh (well habanalabs) are single user,
init can be done at open() time because you only open this when you
actually know you're going to use it.
So given this, shouldn't multi-user inference engines be more like render
drivers, and less like accel? So DRIVER_RENDER, but still under
drivers/accel.
This way that entire separate /dev node would actually become meaningful
beyond just the basic bikeshed:
- render nodes are multi user, safe to iterate and open() just for
iteration
- accel nodes are single user, you really should not ever open them unless
you want to use them
Of course would need a doc patch :-)
Thoughts?
-Daniel
Hmm.
I admit, I thought DRIVER_ACCEL was the same as DRIVER_RENDER, except
that DRIVER_ACCEL dropped the "legacy" dual node setup and also avoided
"legacy" userspace.
qaic is multi-user. I thought habana was the same, at-least for
inference. Oded, am I wrong?
So, if DRIVER_ACCEL is for single-user (training?), and multi-user ends
up in DRIVER_RENDER, that would seem to mean qaic ends up using
DRIVER_RENDER and not DRIVER_ACCEL. Then qaic ends up over under
/dev/dri with both a card node (never used) and a render node. That
would seem to mean that the "legacy" userspace would open qaic nodes by
default - something I understood Oded was trying to avoid.
If there really a usecase for DRIVER_ACCEL to support single-user? I
wonder why we can't default to multi-user, and if a particular
user/driver has a single-user usecase, it enforces that in a driver
specific manner?
-Jeff