Hi Oded (and sorry I misspelled your name last time), Oded Gabbay <oded.gabbay@xxxxxxxxx> writes: > On Tue, Aug 23, 2022 at 9:24 PM Kevin Hilman <khilman@xxxxxxxxxxxx> wrote: >> >> Hi Obed, >> >> Oded Gabbay <oded.gabbay@xxxxxxxxx> writes: >> >> [...] >> >> > I want to update that I'm currently in discussions with Dave to figure >> > out what's the best way to move forward. We are writing it down to do >> > a proper comparison between the two paths (new accel subsystem or >> > using drm). I guess it will take a week or so. >> >> Any update on the discussions with Dave? and/or are there any plans to >> discuss this further at LPC/ksummit yet? > Hi Kevin. > > We are still discussing the details, as at least the habanalabs driver > is very complex and there are multiple parts that I need to see if and > how they can be mapped to drm. > Some of us will attend LPC so we will probably take advantage of that > to talk more about this. OK, looking forward to some more conversations at LPC. >> >> We (BayLibre) are upstreaming support for APUs on Mediatek SoCs, and are >> using the DRM-based approach. I'll also be at LPC and happy to discuss >> in person. >> >> For some context on my/our interest: back in Sept 2020 we initially >> submitted an rpmesg based driver for kernel communication[1]. After >> review comments, we rewrote that based on DRM[2] and are now using it >> for some MTK SoCs[3] and supporting our MTK customers with it. >> >> Hopefully we will get the kernel interfaces sorted out soon, but next, >> there's the userspace side of things. To that end, we're also working >> on libAPU, a common, open userspace stack. Alex Bailon recently >> presented a proposal earlier this year at Embedded Recipes in Paris >> (video[4], slides[5].) >> >> libAPU would include abstractions of the kernel interfaces for DRM >> (using libdrm), remoteproc/rpmsg, virtio etc. but also goes farther and >> proposes an open firmware for the accelerator side using >> libMetal/OpenAMP + rpmsg for communication with (most likely closed >> source) vendor firmware. Think of this like sound open firmware (SOF[6]), >> but for accelerators. > > I think your device and the habana device are very different in > nature, and it is part of what Dave and I discussed, whether these two > classes of devices can live together. I guess they can live together > in the kernel, but in the userspace, not so much imo. Yeah, for now I think focusing on how to handle both classes of devices in the kernel is the most important. > The first class is the edge inference devices (usually as part of some > SoC). I think your description of the APU on MTK SoC is a classic > example of such a device. Correct. > You usually have some firmware you load, you give it a graph and > pointers for input and output and then you just execute the graph > again and again to perform inference and just replace the inputs. > > The second class is the data-center, training accelerators, which > habana's gaudi device is classified as such. These devices usually > have a number of different compute engines, a fabric for scaling out, > on-device memory, internal MMUs and RAS monitoring requirements. Those > devices are usually operated via command queues, either through their > kernel driver or directly from user-space. They have multiple APIs for > memory management, RAS, scaling-out and command-submissions. OK, I see. >> >> We've been using this succesfully for Mediatek SoCs (which have a >> Cadence VP6 APU) and have submitted/published the code, including the >> OpenAMP[7] and libmetal[8] parts in addition to the kernel parts already >> mentioned. > What's the difference between libmetal and other open-source low-level > runtime drivers, such as oneAPI level-zero ? TBH, I'd never heard of oneAPI before, so I'm assuming it's mainly focused in the data center. libmetal/openAMP are widely used in the consumer, industrial embedded space, and heavily used by FPGAs in many market segments. > Currently we have our own runtime driver which is tightly coupled with > our h/w. For example, the method the userspace "talks" to the > data-plane firmware is very proprietary as it is hard-wired into the > architecture of the entire ASIC and how it performs deep-learning > training. Therefore, I don't see how this can be shared with other > vendors. Not because of secrecy but because it is simply not relevant > to any other ASIC. OK, makes sense. Thanks for clarifying your use case in more detail. Kevin