On Tue, Dec 10, 2024 at 10:37:30AM -0800, Yidong Zhang wrote: > AMD Versal based PCIe card, including V70, is designed for AI inference > efficiency and is tuned for video analytics and natural language processing > applications. > > The driver architecture: > > +---------+ Communication +---------+ Remote +-----+------+ > | | Channel | | Queue | | | > | User PF | <============> | Mgmt PF | <=======>| FW | FPGA | > +---------+ +---------+ +-----+------+ > PL Data base FW > APU FW > PL Data (copy) > - PL (FPGA Program Logic) > - FW (Firmware) > > There are 2 separate drivers from the original XRT[1] design. > - UserPF driver > - MgmtPF driver > > The new AMD versal-pci driver will replace the MgmtPF driver for Versal > PCIe card. > > The XRT[1] is already open-sourced. It includes solution of runtime for > many different type of PCIe Based cards. It also provides utilities for > managing and programming the devices. > > The AMD versal-pci stands for AMD Versal brand PCIe device management > driver. This driver provides the following functionalities: > > - module and PCI device initialization > this driver will attach to specific device id of V70 card; > the driver will initialize itself based on bar resources for > - communication channel: > a hardware message service between mgmt PF and user PF > - remote queue: > a hardware queue based ring buffer service between mgmt PF and PCIe > hardware firmware for programming FPGA Program Logic, loading > firmware and checking card healthy status. > > - programming FW > - The base FW is downloaded onto the flash of the card. > - The APU FW is downloaded once after a POR (power on reset). > - Reloading the MgmtPF driver will not change any existing hardware. > > - programming FPGA hardware binaries - PL Data > - using fpga framework ops to support re-programing FPGA > - the re-programming request will be initiated from the existing UserPF > driver only, and the MgmtPF driver load the matched PL Data after > receiving request from the communication channel. The matching PL I think this is not the way the FPGA generic framework should do. A FPGA region user (your userPF driver) should not also be the reprogram requester. The user driver cannot deal with the unexpected HW change if it happens. Maybe after reprogramming, the user driver cannot match the device anymore, and if user driver is still working on it, crash. The expected behavior is, the FPGA region removes user devices (thus detaches user drivers), does reprogramming, re-enumerates/rescans and matches new devices with new drivers. And I think that's what Nava is working on. BTW, AFAICS the expected flow is easier to implement for of-fpga-region, but harder for PCI devices. But I think that's the right direction and should try to work it out. Thanks, Yilun