On Fri, 6 Jan 2023 at 21:44, Oded Gabbay <oded.gabbay@xxxxxxxxx> wrote: > > On Fri, Jan 6, 2023 at 12:45 PM Daniel Vetter <daniel@xxxxxxxx> wrote: > > > > On Fri, 6 Jan 2023 at 10:56, Stanislaw Gruszka > > <stanislaw.gruszka@xxxxxxxxxxxxxxx> wrote: > > > > > > On Fri, Jan 06, 2023 at 10:28:15AM +0100, Daniel Vetter wrote: > > > > On Thu, Jan 05, 2023 at 07:38:26PM +0200, Oded Gabbay wrote: > > > > > On Thu, Jan 5, 2023 at 6:25 PM Jeffrey Hugo <quic_jhugo@xxxxxxxxxxx> wrote: > > > > > > > > > > > > On 1/5/2023 5:57 AM, Daniel Vetter wrote: > > > > > > > On Thu, Dec 08, 2022 at 12:07:27PM +0100, Jacek Lawrynowicz wrote: > > > > > > >> +static const struct drm_driver driver = { > > > > > > >> + .driver_features = DRIVER_GEM | DRIVER_COMPUTE_ACCEL, > > > > > > > > > > > > > > So I was wondering whether this is a bright idea, and whether we shouldn't > > > > > > > just go ahead and infuse more meaning into accel vs render nodes. > > > > > > > > > > > > > > The uapi relevant part of render nodes is that they're multi-user safe, at > > > > > > > least as much as feasible. Every new open() gives you a new private > > > > > > > accelerator. This also has implications on how userspace drivers iterate > > > > > > > them, they just open them all in turn and check whether it's the right > > > > > > > one - because userspace apis allow applications to enumerate them all. > > > > > > > Which also means that any devicie initialization at open() time is a > > > > > > > really bad idea. > > > > > > > > > > > > > > A lot of the compute accelerators otoh (well habanalabs) are single user, > > > > > > > init can be done at open() time because you only open this when you > > > > > > > actually know you're going to use it. > > > > > > > > > > > > > > So given this, shouldn't multi-user inference engines be more like render > > > > > > > drivers, and less like accel? So DRIVER_RENDER, but still under > > > > > > > drivers/accel. > > > > > > > > > > > > > > This way that entire separate /dev node would actually become meaningful > > > > > > > beyond just the basic bikeshed: > > > > > > > - render nodes are multi user, safe to iterate and open() just for > > > > > > > iteration > > > > > > > - accel nodes are single user, you really should not ever open them unless > > > > > > > you want to use them > > > > > > > > > > > > > > Of course would need a doc patch :-) > > > > > > > > > > > > > > Thoughts? > > > > > > > -Daniel > > > > > > > > > > > > Hmm. > > > > > > > > > > > > I admit, I thought DRIVER_ACCEL was the same as DRIVER_RENDER, except > > > > > > that DRIVER_ACCEL dropped the "legacy" dual node setup and also avoided > > > > > > "legacy" userspace. > > > > > > > > > > > > qaic is multi-user. I thought habana was the same, at-least for > > > > > > inference. Oded, am I wrong? > > > > > Habana's devices support a single user at a time acquiring the device > > > > > and working on it. > > > > > Both for training and inference. > > > > > > > > > > > > So, if DRIVER_ACCEL is for single-user (training?), and multi-user ends > > > > > > up in DRIVER_RENDER, that would seem to mean qaic ends up using > > > > > > DRIVER_RENDER and not DRIVER_ACCEL. Then qaic ends up over under > > > > > > /dev/dri with both a card node (never used) and a render node. That > > > > > > would seem to mean that the "legacy" userspace would open qaic nodes by > > > > > > default - something I understood Oded was trying to avoid. > > > > > > > > > > > > If there really a usecase for DRIVER_ACCEL to support single-user? I > > > > > > wonder why we can't default to multi-user, and if a particular > > > > > > user/driver has a single-user usecase, it enforces that in a driver > > > > > > specific manner? > > > > > > > > > > > > -Jeff > > > > > > > > > > Honestly, Daniel, I don't like this suggestion. I don't understand why > > > > > you make a connection between render/accel to single/multi user. > > > > > > > > > > As Jeff has said, one of the goals was to expose accelerator devices > > > > > to userspace with new device char nodes so we won't be bothered by > > > > > legacy userspace graphics software. This is something we all agreed on > > > > > and I don't see why we should change it now, even if you think it's > > > > > bike-shedding (which I disagree with). > > > > > > > > > > But in any case, creating a new device char nodes had nothing to do > > > > > with whether the device supports single or multi user. I can > > > > > definitely see in the future training devices that support multiple > > > > > users. > > > > > > > > > > The common drm/accel ioctls should of course not be limited to a > > > > > single user, and I agree with Jeff here, if a specific driver has such > > > > > a limitation (e.g. Habana), then that driver should handle it on its > > > > > own. > > > > > Maybe if there will be multiple drivers with such a limitation, we can > > > > > make that "handling" to be common code. > > > > > > > > > > Bottom line, I prefer we keep things as we all agreed upon in LPC. > > > > > > > > The problem is going to happen as soon as you have cross-vendor userspace. > > > > Which I'm kinda hoping is at least still the aspiration. Because with > > > > cross-vendor userspace you generally iterate & open all devices before you > > > > select the one you're going to use. And so we do kinda need a distinction, > > > > or we need that the single-user drivers also guarantee that open() is > > > > cheap. > > > > > > FWIW we had good support in ivpu for probe open's in form of lazy context > > > allocation. It was removed recently due to review feedback that this is > > > unnecessary, but we can add it back. > > > > Yeah once you have more than 1 multi-user accel chip in the system you > > need to do that. Which is really the reason why I think smashing > > multi-user client accel things into render is good, it forces drivers > > to suck less. > I still don't think it is a good idea, nor that it was the original > intention of this endeavour. > I don't want to repeat my previous email, so I'll just say again I > don't agree with the connection you make between accel devices and > single users. > Today Habana supports single-user, tomorrow it might change. But that > doesn't mean I will want to present my device as a GPU because I don't > want some app to try and render stuff on my device. > > Moreover, accel devices can actually be specific IPs inside a more > general ASIC (e.g. GPU) which supports multi-user. In that case, one > might want to have an accel driver that supports multi-user on those > IPs but exposes /dev/accel and not render nodes. > Yeah I'm with Oded here, I don't think we want to have single/multi thing at all on the render/accel boundary. For single users as long as open doesn't block and it correctly returns fail, then multi user devices should be fine with contexts etc. Dave.