On Sun, Sep 12, 2021 at 10:32 PM Dave Airlie <airlied@xxxxxxxxx> wrote: > > On Sun, 12 Sept 2021 at 23:55, Greg Kroah-Hartman > <gregkh@xxxxxxxxxxxxxxxxxxx> wrote: > > > > On Fri, Sep 10, 2021 at 06:10:27PM +0200, Daniel Vetter wrote: > > > Forgot to add dri-devel. > > > > > > On Fri, Sep 10, 2021 at 6:09 PM Daniel Vetter <daniel.vetter@xxxxxxxx> wrote: > > > > > > > > On Fri, Sep 10, 2021 at 9:58 AM Greg Kroah-Hartman > > > > <gregkh@xxxxxxxxxxxxxxxxxxx> wrote: > > > > > On Fri, Sep 10, 2021 at 10:26:56AM +0300, Oded Gabbay wrote: > > > > > > Hi Greg, > > > > > > > > > > > > Following our conversations a couple of months ago, I'm happy to tell you that > > > > > > Habanalabs has open-sourced its TPC (Tensor Processing Core) LLVM compiler, > > > > > > which is a fork of the LLVM open-source project. > > > > > > > > > > > > The project can be found on Habanalabs GitHub website at: > > > > > > https://github.com/HabanaAI/tpc_llvm > > > > > > > > > > > > There is a companion guide on how to write TPC kernels at: > > > > > > https://docs.habana.ai/en/latest/TPC_User_Guide/TPC_User_Guide.html > > > > > > > > > > That's great news, thanks for pushing for this and releasing it all! > > > > > > > > Yeah this is neat. > > > > > > > > There's still the problem that we spent the past 2.5 years pissing off > > > > a lot of people for an imo questionable political project, bypassing > > > > all the technical review and expertise. Now that the political > > > > nonsense is resolved I think we need to look at at least the technical > > > > cleanup. The angered people are much harder to fix, so let's maybe > > > > ignore that (or perhaps a ks topic, no idea, I'm honestly not super > > > > motivated to rehash this entire story again). Here's what I think we > > > > should do: > > > > > > > > - move drivers/misc/habanalabs under drivers/gpu/habanalabs and > > > > review/discussions on dri-devel > > > > Wait, why move into gpu? Are we going to do that for all hardware > > accelerators that we currently have in the kernel tree? > > > > We could just mv drivers/gpu drivers/accel if that helps your mental model here. > > > These things are not GPUs in the sense of them being "do some work and > > write out to a screen", which is what I would associate with a GPU (G > > does stand for "Graphical", right?) > > Neither are a lot of the gpu drivers, it's almost like we evolved the > subsystem in 20 years, > and the name got away from us. > > As an example: > etnaviv, panfrost, lima and vgem drivers have no display interfaces at > all. Nada, they do nothing except accelerate and use dma-buf to talk > to other drivers. > > > > Yes, GPUs can do things that some accelerators can do, but they can do > > things that accelerators can not do, and the other way around as well. > > I doubt you want all of the existing gpu drivers to be only treated as > > an "accelerator driver" now, as where would the logic that has to happen > > to get the bits out to a screen live? > > Don't care, totally doesn't matter if a driver is accelerator + > display, you could write in-driver buses if you wanted to abstract > this more, since internally most GPUs are just SoCs, the display and > accelerator pieces talk to power management, irqs and dma-buf like > functionality internally in the driver, the thing is for most GPUs > there is a single PCI device to bind to, so historically nobody has > seen the value in splitting them more or adding an in-driver bus for > one set of devices. > > > And since we have a long history of accepting accelerator drivers (I see > > some in our tree since 2018 at the least), and there is no common > > userspace collation trying to make a common userspace api, why do they > > have to live in the same place? What makes them common except for the > > fact that they use the kernel as a semi-dumb pipe to send work to and > > from a different processor? > > > > Look at drivers/misc/cxl/ and drivers/misc/ocxl and drivers/misc/uacce/ > > and drivers/misc/sgi-gru and drivers/misc/bcm-vk/ even drivers/misc/mei/ > > as that is an off-load engine we talk to, right? > > > > What about the drivers/fpga/ api we have, it handles accelerators as > > well. I'm sure we have many other examples in the kernel tree as well, > > I just did a quick look and found these. > > > > All the above accelerators do things in different ways because their > > hardware is different, so they need different user/kernel apis, right? > > How are we going to unify them? Who is going to unify them? > > > > So drivers/accel/ perhaps? I would be able to get rid of loads of > > drivers/misc/ code that way :) > > > > Who is going to be the new maintainer of this subsystem? > > We already said if we could get agreement on having things follow the > rules, then they can be merged under drm trees or we'd start a new > accel tree. > > The problem is the free-for-all merge with no barriers approach that > you and I believe Olof are campaigning for, doesn't seem to create > communities, it may create consulting or training opportunities for > the Linux Foundation, but thus far I don't see any communities. > > Graphics accelerator community exists because of and has itself > refined the rules over time. I don't think our rules will necessarily > work for other groups immediately but I think other groups need to > construct acceptable merge criteria beyond the kernel, and kernel > maintainers have to take more responsibility for saying no if they > don't have time for community building. > > > > So far they have all been going into drivers/misc/ because no one else > > stepped up to do the review of them except me. I would _LOVE_ the help > > here as I end up reviewing a new one every kernel release at the least, > > but companies do not seem to be willing to fund developers to be > > maintainers these days :( > > > > And yes, I have been reviewing the fpga code as well, even though they > > do have a good maintainer, as those patches flow through my tree due to > > historical reasons. I know the fpga developers would have loved some > > help with review of those patches. > > Lack of reviewing isn't the problem here, lack of responsibility for > creating a long term mess is. You are creating long term dumping > grounds for badly thought out stuff. Saying people keeping adding more > trash to my dump and it's overloading me is just the effect of having > created the dump with no rules to follow in the first place. > > > > > > > - review the dma-buf stuff on dri-devel and then land it through > > > > standard flows, not the gregk-misc bypass > > > > Are dma-bufs somehow required to be reviewed on dri-devel? As others > > have asked in the past, they are already being used in other subsystems > > (like IB) today, did those authors also get review there? > > Yes any use of dma-buf has to be cc'ed to dri-devel and linux-media > per MAINTAINERS > Hi Dave/Daniel, Now that we opened up the user-space compiler and provided a library with which you can load compiled kernels and run them, I've re-sent the two dma-buf patches to dri-devel and linux-media (and to specific people) on Sunday evening. Can you please help review them ? They already got reviewed by Christian and Jason on previous iterations and I fixed them according to their reviews so I believe they are fundamentally correct. Thanks, Oded > > > > If so, great, if not, that feels odd to me, as I am seeing lots of > > out-of-tree drivers start to use these structures, which is why the api > > was created (to stop the roll-your-own-implementations.) Does dri-devel > > want me to have those vendors cc: you all when those get submitted? > > Yes. MAINTAINERS has matching for this, are you not advising people to > use the proper submission techniques and thus bypassing that file? > > The reason is dma-buf and later by extension dma-fence can create > really bad problems for the kernel around memory management. > > https://dri.freedesktop.org/docs/drm/driver-api/dma-buf.html#indefinite-dma-fences > > When a driver is self contained and doesn't interact with other kernel > drivers nobody really has to care. However once a driver starts > interacting with other drivers in the kernel, a responsible maintainer > has to check that these new drivers aren't going to crap all over the > existing drivers and destabilise the kernel. Someone has to review the > hardware design to see if page faulting works or if preemption works > or a bunch of other gotchas. Someone has to review the userspace to > make sure it isn't doing knowingly bad things or making assumptions > based on the kernel driver doing bad things. > > The thing is we've had code merged into our in-tree i915 driver that > broke a bunch of these assumptions, and have had to spend a year > cleaning it out, now this happened post-merge and diligence had > lessened, having the expertise to spot this in new dma-buf/fence users > is why we insist on having access to way more than just the 1000 line > kernel driver submission. > > > > I will be glad to not accept any more, but as I say above, what are the > > new requirements going to be so that those companies that do want to > > submit their code know what to do? > > I'm proposing a patch for documentation that maintainers can sign up > for (it's mentioned in the ksummit thread). > > > And what exactly are we using as a definition of an accelerator? We > > have networking cards that are "accelerators" as well as crypto > > "accelerators" :) > > > > > > I expect we'll have a proper discussion what the stack should look > > > > like with the next submission (from a different vendor maybe), that > > > > ship kinda sailed with habanalabs. > > > > Who is going to define this stack? As there is no industry standard, > > why would we define this? > > Because someone has to help, saying yes isn't helping, it's enabling > back behaviour. Parenting and maintaining both involve saying No for > the future prosperity of the ecosystem. > > Dave.