Re: [PATCH v3 00/14] Driver of Intel(R) Gaussian & Neural Accelerator

Dave Airlie <airlied@xxxxxxxxx> · Tue, 18 May 2021 04:04:52 +1000

On Mon, 17 May 2021 at 19:12, Daniel Vetter <daniel@xxxxxxxx> wrote:
>
> On Mon, May 17, 2021 at 10:55 AM Greg Kroah-Hartman
> <gregkh@xxxxxxxxxxxxxxxxxxx> wrote:
> >
> > On Mon, May 17, 2021 at 10:49:09AM +0200, Daniel Vetter wrote:
> > > On Mon, May 17, 2021 at 10:00 AM Greg Kroah-Hartman
> > > <gregkh@xxxxxxxxxxxxxxxxxxx> wrote:
> > > >
> > > > On Mon, May 17, 2021 at 09:40:53AM +0200, Daniel Vetter wrote:
> > > > > On Fri, May 14, 2021 at 11:00:38AM +0200, Arnd Bergmann wrote:
> > > > > > On Fri, May 14, 2021 at 10:34 AM Greg Kroah-Hartman
> > > > > > <gregkh@xxxxxxxxxxxxxxxxxxx> wrote:
> > > > > > > On Thu, May 13, 2021 at 01:00:26PM +0200, Maciej Kwapulinski wrote:
> > > > > > > > Dear kernel maintainers,
> > > > > > > >
> > > > > > > > This submission is a kernel driver to support Intel(R) Gaussian & Neural
> > > > > > > > Accelerator (Intel(R) GNA). Intel(R) GNA is a PCI-based neural co-processor
> > > > > > > > available on multiple Intel platforms. AI developers and users can offload
> > > > > > > > continuous inference workloads to an Intel(R) GNA device in order to free
> > > > > > > > processor resources and save power. Noise reduction and speech recognition
> > > > > > > > are the examples of the workloads Intel(R) GNA deals with while its usage
> > > > > > > > is not limited to the two.
> > > > > > >
> > > > > > > How does this compare with the "nnpi" driver being proposed here:
> > > > > > >         https://lore.kernel.org/r/20210513085725.45528-1-guy.zadicario@xxxxxxxxx
> > > > > > >
> > > > > > > Please work with those developers to share code and userspace api and
> > > > > > > tools.  Having the community review two totally different apis and
> > > > > > > drivers for the same type of functionality from the same company is
> > > > > > > totally wasteful of our time and energy.
> > > > > >
> > > > > > Agreed, but I think we should go further than this and work towards a
> > > > > > subsystem across companies for machine learning and neural networks
> > > > > > accelerators for both inferencing and training.
> > > > >
> > > > > We have, it's called drivers/gpu. Feel free to rename to drivers/xpu or
> > > > > think G as in General, not Graphisc.
> > > > >
> > > > > > We have support for Intel habanalabs hardware in drivers/misc, and there are
> > > > > > countless hardware solutions out of tree that would hopefully go the same
> > > > > > way with an upstream submission and open source user space, including
> > > > > >
> > > > > > - Intel/Mobileye EyeQ
> > > > > > - Intel/Movidius Keembay
> > > > > > - Nvidia NVDLA
> > > > > > - Gyrfalcon Lightspeeur
> > > > > > - Apple Neural Engine
> > > > > > - Google TPU
> > > > > > - Arm Ethos
> > > > > >
> > > > > > plus many more that are somewhat less likely to gain fully open source
> > > > > > driver stacks.
> > > > >
> > > > > We also had this entire discussion 2 years ago with habanalabs. The
> > > > > hang-up is that drivers/gpu folks require fully open source userspace,
> > > > > including compiler and anything else you need to actually use the chip.
> > > > > Greg doesn't, he's happy if all he has is the runtime library with some
> > > > > tests.
> > >
> > > I guess we're really going to beat this horse into pulp ... oh well.
> > >
> > > > All you need is a library, what you write on top of that is always
> > > > application-specific, so how can I ask for "more"?
> > >
> > > This is like accepting a new cpu port, where all you require is that
> > > the libc port is open source, but the cpu compiler is totally fine as
> > > a blob (doable with llvm now being supported). It makes no sense at
> > > all, at least to people who have worked with accelerators like this
> > > before.
> > >
> > > We are not requiring that applications are open. We're only requiring
> > > that at least one of the compilers you need (no need to open the fully
> > > optimized one with all the magic sauce) to create any kind of
> > > applications is open, because without that you can't use the device,
> > > you can't analyze the stack, and you have no idea at all about what
> > > exactly it is you're merging. With these devices, the uapi visible in
> > > include/uapi is the smallest part of the interface exposed to
> > > userspace.
> >
> > Ok, sorry, I was not aware that the habanalabs compiler was not
> > available to all under an open source license.  All I was trying to
> > enforce was that the library to use the kernel api was open so that
> > anyone could use it.  Trying to enforce compiler requirements like this
> > might feel to be a bit of a reach as the CPU on the hardware really
> > doesn't fall under the license of the operating system running on this
> > CPU over here :)
>
> Experience says if you don't, forget about supporting your
> drivers/subsystem long-term. At best you're stuck with a per-device
> fragmented mess that vendors might or might not support. This has
> nothing to do with GPL licensing or not, but about making sure you can
> do proper engineering/support/review of the driver stack. At least in
> the GPU world we're already making it rather clear that running blobby
> userspace is fine with us (as long as it's using the exact same uapi
> as the truly open stack, no exceptions/hacks/abuse are supported).
>
> Also yes vendors don't like it. But they also don't like that they
> have to open source their kernel drivers, or runtime library. Lots of
> background chats over years, and a very clear line in the sand helps
> to get there, and also makes sure that the vendors who got here don't
> return to the old closed source ways they love so much.
>
> Anyway we've had all this discussions 2 years ago, nothing has changed
> (well on the gpu side we managed to get ARM officially on board with
> fully open stack paid by them meanwhile, other discussions still
> ongoing). I just wanted to re-iterate that if we'd really care about
> having a proper accel subsystem, there's people who've been doing this
> for decades.

I think the other point worth reiterating is that most of these
devices are unobtanium for your average kernel maintainer. It's hard
to create a subsystem standard when you don't have access to a
collection of devices + the complete picture of what the stack is
doing and how it interoperates with the ecosystem at large, not just
the kernel. Kernel maintainers need to help ensure there is a viable
ecosystem beyond the kernel before merging stuff that is clearly a
large kernel + user stack architecture. i.e. misc USB drivers, merge
away, misc small layer drivers for larger vendor-specific ecosystems
we need to tread more carefully as longterm we do nobody any favours.

Dave.
>
> -Daniel
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch