On Thu, Jan 05, 2023 at 07:38:26PM +0200, Oded Gabbay wrote: > On Thu, Jan 5, 2023 at 6:25 PM Jeffrey Hugo <quic_jhugo@xxxxxxxxxxx> wrote: > > > > On 1/5/2023 5:57 AM, Daniel Vetter wrote: > > > On Thu, Dec 08, 2022 at 12:07:27PM +0100, Jacek Lawrynowicz wrote: > > >> +static const struct drm_driver driver = { > > >> + .driver_features = DRIVER_GEM | DRIVER_COMPUTE_ACCEL, > > > > > > So I was wondering whether this is a bright idea, and whether we shouldn't > > > just go ahead and infuse more meaning into accel vs render nodes. > > > > > > The uapi relevant part of render nodes is that they're multi-user safe, at > > > least as much as feasible. Every new open() gives you a new private > > > accelerator. This also has implications on how userspace drivers iterate > > > them, they just open them all in turn and check whether it's the right > > > one - because userspace apis allow applications to enumerate them all. > > > Which also means that any devicie initialization at open() time is a > > > really bad idea. > > > > > > A lot of the compute accelerators otoh (well habanalabs) are single user, > > > init can be done at open() time because you only open this when you > > > actually know you're going to use it. > > > > > > So given this, shouldn't multi-user inference engines be more like render > > > drivers, and less like accel? So DRIVER_RENDER, but still under > > > drivers/accel. > > > > > > This way that entire separate /dev node would actually become meaningful > > > beyond just the basic bikeshed: > > > - render nodes are multi user, safe to iterate and open() just for > > > iteration > > > - accel nodes are single user, you really should not ever open them unless > > > you want to use them > > > > > > Of course would need a doc patch :-) > > > > > > Thoughts? > > > -Daniel > > > > Hmm. > > > > I admit, I thought DRIVER_ACCEL was the same as DRIVER_RENDER, except > > that DRIVER_ACCEL dropped the "legacy" dual node setup and also avoided > > "legacy" userspace. > > > > qaic is multi-user. I thought habana was the same, at-least for > > inference. Oded, am I wrong? > Habana's devices support a single user at a time acquiring the device > and working on it. > Both for training and inference. > > > > So, if DRIVER_ACCEL is for single-user (training?), and multi-user ends > > up in DRIVER_RENDER, that would seem to mean qaic ends up using > > DRIVER_RENDER and not DRIVER_ACCEL. Then qaic ends up over under > > /dev/dri with both a card node (never used) and a render node. That > > would seem to mean that the "legacy" userspace would open qaic nodes by > > default - something I understood Oded was trying to avoid. > > > > If there really a usecase for DRIVER_ACCEL to support single-user? I > > wonder why we can't default to multi-user, and if a particular > > user/driver has a single-user usecase, it enforces that in a driver > > specific manner? > > > > -Jeff > > Honestly, Daniel, I don't like this suggestion. I don't understand why > you make a connection between render/accel to single/multi user. > > As Jeff has said, one of the goals was to expose accelerator devices > to userspace with new device char nodes so we won't be bothered by > legacy userspace graphics software. This is something we all agreed on > and I don't see why we should change it now, even if you think it's > bike-shedding (which I disagree with). > > But in any case, creating a new device char nodes had nothing to do > with whether the device supports single or multi user. I can > definitely see in the future training devices that support multiple > users. > > The common drm/accel ioctls should of course not be limited to a > single user, and I agree with Jeff here, if a specific driver has such > a limitation (e.g. Habana), then that driver should handle it on its > own. > Maybe if there will be multiple drivers with such a limitation, we can > make that "handling" to be common code. > > Bottom line, I prefer we keep things as we all agreed upon in LPC. The problem is going to happen as soon as you have cross-vendor userspace. Which I'm kinda hoping is at least still the aspiration. Because with cross-vendor userspace you generally iterate & open all devices before you select the one you're going to use. And so we do kinda need a distinction, or we need that the single-user drivers also guarantee that open() is cheap. With just habana and vpu there's no problem because you'll never see these in the same machine. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch