On Sun, Aug 7, 2022 at 9:43 AM Oded Gabbay <oded.gabbay@xxxxxxxxx> wrote: > > On Fri, Aug 5, 2022 at 3:22 AM Jason Gunthorpe <jgg@xxxxxxxxxx> wrote: > > > > On Thu, Aug 04, 2022 at 08:48:28PM +0300, Oded Gabbay wrote: > > > > > > The flip is true of DRM - DRM is pretty general. I bet I could > > > > implement an RDMA device under DRM - but that doesn't mean it should > > > > be done. > > > > > > > > My biggest concern is that this subsystem not turn into a back door > > > > for a bunch of abuse of kernel APIs going forward. Though things > > > > are > > > > > > How do you suggest to make sure it won't happen ? > > > > Well, if you launch the subsystem then it is your job to make sure it > > doesn't happen - or endure the complaints :) > Understood, I'll make sure there is no room for complaints. > > > > Accelerators have this nasty tendancy to become co-designed with their > > SOCs in nasty intricate ways and then there is a desire to punch > > through all the inconvenient layers to make it go faster. > > > > > > better now, we still see this in DRM where expediency or performance > > > > justifies hacky shortcuts instead of good in-kernel architecture. At > > > > least DRM has reliable and experienced review these days. > > > Definitely. DRM has some parts that are really well written. For > > > example, the whole char device handling with sysfs/debugfs and managed > > > resources code. > > > > Arguably this should all be common code in the driver core/etc - what > > value is a subsystem adding beyond that besides using it properly? > > I mainly see two things here: > > 1. If there is a subsystem which is responsible for creating and > exposing the device character files, then there should be some code > that connects between each device driver to that subsystem. > i.e. There should be functions that each driver should call from its > probe and release callback functions. > > Those functions should take care of the following: > - Create metadata for the device, the device's minor(s) and the > driver's ioctls table and driver's callback for file operations (both > are common for all the driver's devices). Save all that metadata with > proper locking. > - Create the device char files themselves and supply file operations > that will be called per each open/close/mmap/etc. > - Keep track of all these objects' lifetime in regard to the device > driver's lifetime, with proper handling for release. > - Add common handling and entries of sysfs/debugfs for these devices > with the ability for each device driver to add their own unique > entries. > > 2. I think that you underestimate (due to your experience) the "using > it properly" part... It is not so easy to do this properly for > inexperienced kernel people. If we provide all the code I mentioned > above, the device driver writer doesn't need to be aware of all these > kernel APIs. > Two more points I thought of as examples for added value to be done in common code: 1. Common code for handling dma-buf. Very similar to what was added a year ago to rdma. This can be accompanied by a common ioctl to export and import a dma-buf. 2. Common code to handle drivers that want to allow a single user at a time to run open the device char file. Oded > > > > It would be nice to at least identify something that could obviously > > be common, like some kind of enumeration and metadata kind of stuff > > (think like ethtool, devlink, rdma tool, nvemctl etc) > Definitely. I think we can have at least one ioctl that will be common > to all drivers from the start. > A kind of information retrieval ioctl. There are many information > points that I'm sure are common to most accelerators. We have an > extensive information ioctl in the habanalabs driver and most of the > information there is not habana specific imo. > > > > > I think that it is clear from my previous email what I intended to > > > provide. A clean, simple framework for devices to register with, get > > > services for the most basic stuff such as device char handling, > > > sysfs/debugfs. > > > > This should all be trivially done by any driver using the core codes, > > if you see gaps I'd ask why not improve the core code? > > > > > Later on, add more simple stuff such as memory manager > > > and uapi for memory handling. I guess someone can say all that exists > > > in drm, but like I said it exists in other subsystems as well. > > > > This makes sense to me, all accelerators need a way to set a memory > > map, but on the other hand we are doing some very nifty stuff in this > > area with iommufd and it might be very interesting to just have the > > accelerator driver link to that API instead of building yet another > > copy of pin_user_pages() code.. Especially with PASID likely becoming > > part of any accelerator toolkit. > Here I disagree with you. First of all, there are many relatively > simple accelerators, especially in edge, where PASID is really not > relevant. > Second, even for the more sophisticated PCIe/CXL-based ones, PASID is > not mandatory and I suspect that it won't be in 100% of those devices. > But definitely that should be an alternative to the "classic" way of > handling dma'able memory (pin_user_pages()). > > > > > I want to be perfectly honest and say there is nothing special here > > > for AI. It's actually the opposite, it is a generic framework for > > > compute only. Think of it as an easier path to upstream if you just > > > want to do compute acceleration. Maybe in the future it will be more, > > > but I can't predict the future. > > > > I can't either, and to be clear I'm only questioning the merit of a > > "subsystem" eg with a struct class, rigerous uAPI, etc. > > > > The idea that these kinds of accel drivers deserve specialized review > > makes sense to me, even if they remain as unorganized char > > devices. Just understand that is what you are signing up for :) > I understand. That's why I'm taking all your points very seriously. > This is not a decision that should be taken lightly and I want to be > sure most agree this is the correct way forward. > My next step is to talk to Dave about it in-depth. In his other email > he wrote some interesting ideas which I want to discuss with him. > > Maybe this is something that should be discussed in the kernel summit ? > > > > > Jason