On Thu, Dec 17, 2020 at 9:40 AM Jiaying Liang <wendy.liang@xxxxxxxxxx> wrote: > > > On 12/15/20 7:23 AM, Alex Deucher wrote: > > On Mon, Dec 14, 2020 at 7:24 PM Jiaying Liang<wendy.liang@xxxxxxxxxx> wrote: > >> On 12/11/20 11:39 AM, Daniel Vetter wrote: > >>> Hi all > >>> > >>> On Fri, Dec 11, 2020 at 8:03 PM Alex Deucher<alexdeucher@xxxxxxxxx> wrote: > >>>> On Mon, Nov 30, 2020 at 3:25 AM Wendy Liang<wendy.liang@xxxxxxxxxx> wrote: > >>>>> AI engine is the acceleration engine provided by Xilinx. These engines > >>>>> provide high compute density for vector-based algorithms, and flexible > >>>>> custom compute and data movement. It has core tiles for compute and > >>>>> shim tiles to interface the FPGA fabric. > >>>>> > >>>>> You can check the AI engine architecture document for more hardware details: > >>>>> https://www.xilinx.com/support/documentation/architecture-manuals/am009-versal-ai-engine.pdf > >>>>> > >>>>> This patch series adds a Linux kernel driver to manage the Xilinx AI > >>>>> engine array device and AI engine partitions (groups of AI engine tiles > >>>>> dedicated to an application). > >>>> Hi Wendy, > >>>> > >>>> I think it would be good to provide an overview of how your stack > >>>> works in general. That would give reviewers a better handle on how > >>>> all of this fits together. I'd suggest including an overview in the > >>>> cover letter and also in the commit message and/or as a comment in the > >>>> code in one of the patches. I'm not really an expert when it comes to > >>>> FPGAs, but this basically looks like a pretty low level interface to > >>>> set up the data fabric for a kernel that will run on the soft logic or > >>>> maybe the microcontroller on the board. It doesn't have to be super > >>>> detailed, just a nice flow for how you might use this. E.g., > >>>> > >>>> Userspace uses ioctls X, Y, Z to configure the data fabric for the > >>>> FPGA kernel. The kernels can run on... . DMA access to system memory > >>>> for data sets can be allocated using ioctl A. DMA access is limited > >>>> by... . The user can then load the FPGA kernel on to one of the > >>>> engines using ioctl B and finally they can kick off the whole thing > >>>> using ioctl C. FPGA kernels are compiled using YYY toolchain and use > >>>> use the following runtime (link to runtime) to configure the data > >>>> fabric using ioctls X, Y, Z. > >>> At least for drm drivers we ideally have that as a .rst file in > >>> Documentation/. With that you can even do full svg graphs, or just dot > >>> graphs, of the overall stack if you really want to go overboard :-) > >>> > >>>> It would also be good to go over the security implications of the > >>>> design. E.g., can the FPGA kernel(s) access the DMA engine directly, > >>>> or is it limited to just the DMA regions set up by the ioctls? Also, > >>>> does the hardware and software design allow for multiple users? If > >>>> so, how does that work? > >>> I've also seen indications that there's some on-chip or on-card > >>> memory. How that's planned to be used and whether we want to manage > >>> this (maybe even with something like ttm) would be good to understand. > >>> > >>> All excellent questions from Alex, just figured I add some more. > >>> > >>> Cheers, Daniel > >> Hi Alex, Daniel, > >> > >> Below is an overview of the driver. > >> > >> AI engine kernel driver manages Xilinx AI engine device. An AI engine device > >> contains cores tiles and SHIM tiles. Core tiles are the computation tiles > >> , the SHIM tiles are the tiles interfacing to external components. > >> > >> +--------+--------+--------+--------+ > >> | Core | Core | Core | Core | ... > >> | | | | | > >> +-----------------------------------+ > >> | Core | Core | Core | Core | ... > >> | | | | | > >> +--------+--------+--------+--------- > >> ... > >> +--------+--------+-----------------+ > >> | SHIM | SHIM | SHIM |SHIM | > >> | PL | PL | PL |PL | NOC | > >> +---+----+---+----+---+-----+-------+ > >> AXI Streams | | | | |AXI MM > >> | | | | | > >> Events Singals | | | | | > >> | | | | | > >> | | | | | > >> +---+--------+--------+-----+ +--+------+ > >> | FPGA | | > >> NOC | > >> | | | | > >> +---------------------------+ +--+-------+ > >> | > >> | > >> +---+------+ > >> | DDR | > >> +----------+ > >> > >> Each Core tile contains computing module, local memory and DMA module. The > >> local memory DMA module takes data from or to the AXI streams and writes > >> it to or reads it from the local memory. The computing module can also > >> directly get/put data from/to the AXI stream. The AIE SHIM enables AIE tiles > >> to get/put data from/to AXI streams from FPGA, enables external master to > >> access AI engine address space through AXI MM. SHIM NoC module has DMA > >> engine, > >> which can access extern memory though AXI MM and push it to internal AXI > >> streams. > >> > >> At runtime, the AI engine tiles interconnection needs to be configured > >> so that > >> it can get fetch data from external components or adjacent tiles, and AI > >> engine > >> core program needs to be loaded. And then user application can push data > >> to the > >> AI engine array and start/stop AI engine core. AI engine device errors > >> can be > >> raised as events, the AI engine kernel driver listens to the events > >> interrupt > >> to monitor runtime async device errors. > >> > >> Instead of application directly interacting with the AI engine kernel > >> APIs, user > >> application/libraries interacts with AI engine userspace library: > >> https://github.com/Xilinx/embeddedsw/tree/master/XilinxProcessorIPLib/drivers/aienginev2 > >> It provides cross OSes low level functional abstraction such as how to > >> connect one > >> stream port to another stream port, how to configure core tile local DMA. > >> > >> The AI engine library can be used by other runtime libraries such as > >> Xilinx runtime (XRT) > >> library:https://xilinx.github.io/XRT/master/html/index.html, > >> which provides acceleration abstraction for Xilinx accelerators, it has > >> extensions > >> to interface to other acceleration framework such as OpenCL. > >> XRT provides buffer handling abstractions for user application to share > >> data between > >> applicaiton and devices. > >> > >> Here is an example of application runtime stack: > >> > >> +----------------------------+ > >> | Application | > >> | | > >> +----------------------------+ > >> | XRT | > >> | | > >> +----------------------------+ > >> | AIE Library | > >> | | > >> +----------------------------+ > >> +----------------------------------------+ > >> Kern +----------------------------+ > >> | AIE Partition +--+ > >> +----------------------------+ | > >> |----------------------------+ > >> +----------------------------+ > >> | AIE Device | > >> | | > >> +----------------------------+ > >> > >> > >> > >> The AI engine kernel driver provides the following user interfaces: > >> * AIE device driver is the root device driver to manage the partitions of > >> of the AI engine device array. AI engine array can be partitioned into > >> column wised isolated partitions. Each applicaiton can only access its > >> own partitions. > >> * AIE device driver monitors the interrupt from the AI enigne device. All > >> AI engine tiles shared the same interrupt for error events. > >> * AIE partition driver controls address mapping and access of the > >> registers/local memories of the tiles within a partition. > >> * It provides mmap operation to enable application to direclty > >> access the > >> tiles local memories for small data update such as parameter > >> update for > >> performance. > >> * It provides mmap operatio to map all the registers as readonly for > >> application to poll registers efficiently to check status. > >> * It provides ioctl for userspace to pass I/O commands to write/mask > >> write > >> the registers. How to configure is defined by userspace. Userspace > >> will > >> pass the I/O commands sequence to the kernel driver, and kernel driver > >> will validate the commands before it writes to the registers. > >> * It provides ioctl to import dmabuf and ioctl to configure the the > >> DMA module > >> in the SHIM tile which can access memory outside AI engine array. > >> > >> The buffer management is out of this driver. In the above example, user > >> application > >> uses Xilinx runtime(XRT), XRT is the one to manage the buffers. > >> > > So if I understand this correctly, this driver handles the resource > > management for the AI engines, PLs (programmable logic), and DMA > > streams. I think it's important to understand that there are multiple > > address spaces here. Normally when we talk about DMA in the kernel we > > are referring to devices accessing an external resource like system > > memory on the host CPU or another device's MMIO space (e.g., another > > PCIe device). It would be good to clarify which address spaces the > > DMAs in your diagram refer to. I think the DMAs in the AI engines are > > specifically for DMAs within the AI engine logic (e.g., between AIs in > > a partition). How is DMA to system memory handled? What about > > dedicated memory on the FPGA (e.g., HBM or DDR on the FPGA itself)? > > Is that what you are exposing as DMA bufs? When you allocate a > > DMA-buf for a partition, is that partition only allowed to access > > memory that is part of that DMA buf? I presume there is some > > scatter/gather table that sets up the DMA range that the partition can > > access? Who loads the soft logic (Is that the PL or some other IP)? > > Is the soft logic partitioned as well? If I had some soft logic I > > wanted to run on the FPGA, what would the kernel driver interaction > > sequence look like? Maybe using the OpenCL soft logic would be a good > > example. E.g., > > The AI engine driver only manage the resources within the AI > > engine array. There are two types of DMAs of the AI engine device. > > one is the AI engine tile local memory DMA which can only access the local > > memory. There is another type of DMA which is in the SHIM tile. This > > DMA can access external address space such as DDR. Although it can acess > > the memory on fpga if user configure the platform that way, it is > preferred to > > use PL data mover to move data between FPGA memory and AI engine device. > > The PL data mover will not be managed by the AI engine driver. > > One SHIM DMA has up to 16 buffer descriptors to use. > > Xilinx FPGA manager is the one used to program the FPGA soft logic. > > E.g. when XRT is used, if AI engine is connected to FPGA logic, the XRT > stack is > > the one to manage the configuration sequence. > > > 1. user has soft logic blob generated by their soft logic compiler (is > > this compiler open source?) > The soft logic blob is generated by Xilinx tools which is not open > source yet. > > 2. user calls AI engine kernel driver to allocate the required > > resources (AI engines, AI engine DMAs, doorbells of some sort? etc.) > > User will call AI engine kernel driver to allocate required resources within > > the AI engine array at runtime. > > However the patches for it is not in this patch set. > > > 3. user calls AI engine kernel driver to allocate system memory and/or > > FGPA memory that can be used by the soft logic blob > > AI engine kernel driver doesn't allocate system memory. User can use other > > kernel driver to allocate memory. > > E.g. when XRT is used, user calls XRT kernel driver (zocl) to allocate > system memory. > > So far, the FPGA memory is usually assigned to a soft data mover when > the platform is > > created. Are you considering to have the FPGA memory in the DMA pool of the > > system? If it is dedicated to a device, can reserved memory solve this > problem? > > The AI engine kernel driver doesn't consider this yet. > > > 4. user calls AI engine kernel driver to load soft logic > > I assume you are referring to the soft logic on the FPGA side which is not > > part of the AI engine device. FPGA manager is the one to load the soft > logic on FPGA. > > > 5. user interfaces with soft logic (how? presumably via some memory > > resource allocated in 2 and 3?) > > I assume you are referring to the soft logic on the FPGA side (not the > AI engine device) > > The user interface with soft logic is managed by the soft logic IP driver. > > Each soft logic has some memory mapped control registers. User can > access those > > registers through the soft logic IP driver. > > About memory allocation, I think it is better to manage the shared > memory out of > > a specific device driver. Are you looking for memory management which covers > > both the system memory and fpga memory, and the device can specify which > memory > > it prefers? Ok, I think the picture is getting clearer. But now I'm wondering why you have any interactions with dma-buf in this patch series here? -Daniel > Thanks, > > Wendy > > > > > Thanks, > > > > Alex > > > > > >> Best Regards, > >> > >> Wendy > >> > >>>> Thanks, > >>>> > >>>> Alex > >>>> > >>>> > >>>>> v3: > >>>>> * unlock AIE dev mutex after failed to gain the partition lock in > >>>>> errors handing > >>>>> * replace pointer with __u64 and enum with __u32 in ioctl > >>>>> > >>>>> v2: > >>>>> * Fix dtschema check errors > >>>>> * Fix test bot warning on interrupt implementation. Removed set but > >>>>> unused varaible. > >>>>> * Fix compilation unused function warning of firmware change in case > >>>>> ZynqMP firmware is not configured > >>>>> * There are other warning on ZynqMP firmware reported from testbot > >>>>> which is not introduced by this patch set. > >>>>> "[PATCH] firmware: xlnx-zynqmp: fix compilation warning" is submitted > >>>>> for those fixes. > >>>>> > >>>>> > >>>>> Izhar Ameer Shaikh (1): > >>>>> firmware: xilinx: Add IOCTL support for AIE ISR Clear > >>>>> > >>>>> Nishad Saraf (2): > >>>>> misc: xilinx-ai-engine: Add support to request device management > >>>>> services > >>>>> misc: xilinx-ai-engine: Add support for servicing error interrupts > >>>>> > >>>>> Wendy Liang (6): > >>>>> dt-binding: soc: xilinx: ai-engine: Add AI engine binding > >>>>> misc: Add Xilinx AI engine device driver > >>>>> misc: xilinx-ai-engine: Implement AI engine cleanup sequence > >>>>> misc: xilinx-ai-engine: expose AI engine tile memories to userspace > >>>>> misc: xilinx-ai-engine: add setting shim dma bd operation > >>>>> misc: xilinx-ai-engine: add request and release tiles > >>>>> > >>>>> .../bindings/soc/xilinx/xlnx,ai-engine.yaml | 126 ++++ > >>>>> MAINTAINERS | 8 + > >>>>> drivers/firmware/xilinx/zynqmp.c | 14 + > >>>>> drivers/misc/Kconfig | 12 + > >>>>> drivers/misc/Makefile | 1 + > >>>>> drivers/misc/xilinx-ai-engine/Makefile | 16 + > >>>>> drivers/misc/xilinx-ai-engine/ai-engine-aie.c | 608 +++++++++++++++++++ > >>>>> drivers/misc/xilinx-ai-engine/ai-engine-clock.c | 245 ++++++++ > >>>>> drivers/misc/xilinx-ai-engine/ai-engine-dev.c | 496 ++++++++++++++++ > >>>>> drivers/misc/xilinx-ai-engine/ai-engine-dma.c | 481 +++++++++++++++ > >>>>> drivers/misc/xilinx-ai-engine/ai-engine-internal.h | 519 ++++++++++++++++ > >>>>> .../misc/xilinx-ai-engine/ai-engine-interrupt.c | 659 +++++++++++++++++++++ > >>>>> drivers/misc/xilinx-ai-engine/ai-engine-mem.c | 275 +++++++++ > >>>>> drivers/misc/xilinx-ai-engine/ai-engine-part.c | 635 ++++++++++++++++++++ > >>>>> drivers/misc/xilinx-ai-engine/ai-engine-res.c | 219 +++++++ > >>>>> drivers/misc/xilinx-ai-engine/ai-engine-reset.c | 159 +++++ > >>>>> include/linux/firmware/xlnx-zynqmp.h | 8 + > >>>>> include/uapi/linux/xlnx-ai-engine.h | 238 ++++++++ > >>>>> 18 files changed, 4719 insertions(+) > >>>>> create mode 100644 Documentation/devicetree/bindings/soc/xilinx/xlnx,ai-engine.yaml > >>>>> create mode 100644 drivers/misc/xilinx-ai-engine/Makefile > >>>>> create mode 100644 drivers/misc/xilinx-ai-engine/ai-engine-aie.c > >>>>> create mode 100644 drivers/misc/xilinx-ai-engine/ai-engine-clock.c > >>>>> create mode 100644 drivers/misc/xilinx-ai-engine/ai-engine-dev.c > >>>>> create mode 100644 drivers/misc/xilinx-ai-engine/ai-engine-dma.c > >>>>> create mode 100644 drivers/misc/xilinx-ai-engine/ai-engine-internal.h > >>>>> create mode 100644 drivers/misc/xilinx-ai-engine/ai-engine-interrupt.c > >>>>> create mode 100644 drivers/misc/xilinx-ai-engine/ai-engine-mem.c > >>>>> create mode 100644 drivers/misc/xilinx-ai-engine/ai-engine-part.c > >>>>> create mode 100644 drivers/misc/xilinx-ai-engine/ai-engine-res.c > >>>>> create mode 100644 drivers/misc/xilinx-ai-engine/ai-engine-reset.c > >>>>> create mode 100644 include/uapi/linux/xlnx-ai-engine.h > >>>>> > >>>>> -- > >>>>> 2.7.4 > >>>>> > >>>>> _______________________________________________ > >>>>> dri-devel mailing list > >>>>> dri-devel@xxxxxxxxxxxxxxxxxxxxx > >>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel > >>>> _______________________________________________ > >>>> dri-devel mailing list > >>>> dri-devel@xxxxxxxxxxxxxxxxxxxxx > >>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch