On Thu, Aug 4, 2022 at 5:50 PM Jason Gunthorpe <jgg@xxxxxxxxxx> wrote: > > On Thu, Aug 04, 2022 at 10:43:42AM +0300, Oded Gabbay wrote: > > > After all, memory management services, or common device chars handling > > I can get from other subsystems (e.g. rdma) as well. I'm sure I could > > model my uAPI to be rdma uAPI compliant (I can define proprietary uAPI > > there as well), but this doesn't mean I belong there, right ? > > You sure can, but there is still an expectation, eg in RDMA, that your > device has a similarity to the established standards (like roce in > habana's case) that RDMA is geared to support. > > I think the the most important thing to establish a new subsystem is > to actually identify what commonalities it is supposed to be > providing. Usually this is driven by some standards body, but the > AI/ML space hasn't gone in that direction at all yet. I agree. In the AI-world the standard doesn't exist and I don't see anything on the horizon. There are the AI frameworks/compilers which are 30,000 feet above us, and there is CUDA which is closed-source and I have no idea what it does inside. > > We don't need a "subsystem" to have a bunch of drivers expose chardevs > with completely unique ioctls. I totally agree with this sentence and this is *exactly* why personally I don't want to use DRM because when I look at the long list of common IOCTLs in drm.h, I don't find anything that I can use. It's simply either not relevant at all to my h/w or it is something that our h/w implemented differently. This is in contrast to the rdma, where as you said, we have ibverbs API. So, when you asked that we write an IBverbs driver I understood the reasoning. There is a common user-space library which talks to the rdma drivers and all the rdma applications use that library and once I will write a (somewhat) standard driver, then hopefully I can enjoy all that. > > The flip is true of DRM - DRM is pretty general. I bet I could > implement an RDMA device under DRM - but that doesn't mean it should > be done. > > My biggest concern is that this subsystem not turn into a back door > for a bunch of abuse of kernel APIs going forward. Though things are How do you suggest to make sure it won't happen ? > better now, we still see this in DRM where expediency or performance > justifies hacky shortcuts instead of good in-kernel architecture. At > least DRM has reliable and experienced review these days. Definitely. DRM has some parts that are really well written. For example, the whole char device handling with sysfs/debugfs and managed resources code. This is something I would gladly either use or copy-paste into the hw accel subsystem. And of course more pairs of eyes looking at the code will usually produce better code. I think that it is clear from my previous email what I intended to provide. A clean, simple framework for devices to register with, get services for the most basic stuff such as device char handling, sysfs/debugfs. Later on, add more simple stuff such as memory manager and uapi for memory handling. I guess someone can say all that exists in drm, but like I said it exists in other subsystems as well. I want to be perfectly honest and say there is nothing special here for AI. It's actually the opposite, it is a generic framework for compute only. Think of it as an easier path to upstream if you just want to do compute acceleration. Maybe in the future it will be more, but I can't predict the future. If that's not enough for a new subsystem, fair enough, I'll withdraw my offer. Thanks, Oded > > Jason