Re: New subsystem for acceleration devices

Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxxxxxxx> · Thu, 4 Aug 2022 13:00:26 +0100

On 04/08/2022 00:54, Dave Airlie wrote:
On Thu, 4 Aug 2022 at 06:21, Oded Gabbay <oded.gabbay@xxxxxxxxx> wrote:

On Wed, Aug 3, 2022 at 10:04 PM Dave Airlie <airlied@xxxxxxxxx> wrote:

On Sun, 31 Jul 2022 at 22:04, Oded Gabbay <oded.gabbay@xxxxxxxxx> wrote:

Hi,
Greg and I talked a couple of months ago about preparing a new accel
subsystem for compute/acceleration devices that are not GPUs and I
think your drivers that you are now trying to upstream fit it as well.

We've had some submissions for not-GPUs to the drm subsystem recently.

Intel GNA, Intel VPU, NVDLA, rpmsg AI processor unit.

why is creating a new subsystem at this time necessary?

Are we just creating a subsystem to avoid the open source userspace
consumer rules? Or do we have some concrete reasoning behind it?

Dave.

Hi Dave.
The reason it happened now is because I saw two drivers, which are
doing h/w acceleration for AI, trying to be accepted to the misc
subsystem.
Add to that the fact I talked with Greg a couple of months ago about
doing a subsystem for any compute accelerators, which he was positive
about, I thought it is a good opportunity to finally do it.

I also honestly think that I can contribute much to these drivers from
my experience with the habana driver (which is now deployed in mass at
AWS) and contribute code from the habana driver to a common framework
for AI drivers.

Why not port the habana driver to drm now instead? I don't get why it
wouldn't make sense?

Stepping up to create a new subsystem is great, but we need rules
around what belongs where, we can't just spawn new subsystems when we
have no clear guidelines on where drivers should land.

What are the rules for a new accel subsystem? Do we have to now
retarget the 3 drivers that are queued up to use drm for accelerators,
because 2 drivers don't?

Isn't there three on the "don't prefer drm" side as well? Habana, 
Toshiba and Samsung? Just so the numbers argument is not misrepresented. 
Perhaps a poll like a) prefer DRM, b) prefer a new subsystem, c) don't 
care in principle; is in order?

More to the point, code sharing is a very compelling argument if it can 
be demonstrated to be significant, aka not needing to reinvent the same 
wheel.

Perhaps one route forward could be a) to consider is to rename DRM to 
something more appropriate, removing rendering from the name and 
replacing with accelerators, co-processors, I don't know... Although I 
am not sure renaming the codebase, character device node names and 
userspace headers is all that feasible. Thought to mention it 
nevertheless, maybe it gives an idea to someone how it could be done.

And b) allow the userspace rules to be considered per driver, or per 
class (is it a gpu or not should be a question that can be answered). 
Shouldn't be a blocker if it still matches the rules present elsewhere 
in the kernel.

Those two would remove the two most contentions points as far as I 
understood the thread.

Regards,

Tvrtko

There's a lot more to figure out here than merge some structures and
it will be fine.

I think the one area I can see a divide where a new subsystem is for
accelerators that are single-user, one shot type things like media
drivers (though maybe they could be just media drivers).

I think anything that does command offloading to firmware or queues
belongs in drm, because that is pretty much what the framework does. I
think it might make sense to enhance some parts of drm to fit things
in better, but that shouldn't block things getting started.

I'm considering if, we should add an accelerator staging area to drm
and land the 2-3 submissions we have and try and steer things towards
commonality that way instead of holding them out of tree.

I'd like to offload things from Greg by just not having people submit
misc drivers at all for things that should go elsewhere.

Dave.

Regarding the open source userspace rules in drm - yes, I think your
rules are too limiting for the relatively young AI scene, and I saw at
the 2021 kernel summit that other people from the kernel community
think that as well.
But that's not the main reason, or even a reason at all for doing
this. After all, at least for habana, we open-sourced our compiler and
a runtime library. And Greg also asked those two drivers if they have
matching open-sourced user-space code.

And a final reason is that I thought this can also help in somewhat
reducing the workload on Greg. I saw in the last kernel summit there
was a concern about bringing more people to be kernel maintainers so I
thought this is a step in the right direction.

Oded