On Sun, Jan 26, 2025 at 11:46:54AM -0800, Yidong Zhang wrote: > > > On 1/26/25 02:32, Xu Yilun wrote: > > Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding. > > > > > > On Tue, Dec 10, 2024 at 10:37:30AM -0800, Yidong Zhang wrote: > > > AMD Versal based PCIe card, including V70, is designed for AI inference > > > efficiency and is tuned for video analytics and natural language processing > > > applications. > > > > > > The driver architecture: > > > > > > +---------+ Communication +---------+ Remote +-----+------+ > > > | | Channel | | Queue | | | > > > | User PF | <============> | Mgmt PF | <=======>| FW | FPGA | > > > +---------+ +---------+ +-----+------+ > > > PL Data base FW > > > APU FW > > > PL Data (copy) > > > - PL (FPGA Program Logic) > > > - FW (Firmware) > > > > > > There are 2 separate drivers from the original XRT[1] design. > > > - UserPF driver > > > - MgmtPF driver > > > > > > The new AMD versal-pci driver will replace the MgmtPF driver for Versal > > > PCIe card. > > > > > > The XRT[1] is already open-sourced. It includes solution of runtime for > > > many different type of PCIe Based cards. It also provides utilities for > > > managing and programming the devices. > > > > > > The AMD versal-pci stands for AMD Versal brand PCIe device management > > > driver. This driver provides the following functionalities: > > > > > > - module and PCI device initialization > > > this driver will attach to specific device id of V70 card; > > > the driver will initialize itself based on bar resources for > > > - communication channel: > > > a hardware message service between mgmt PF and user PF > > > - remote queue: > > > a hardware queue based ring buffer service between mgmt PF and PCIe > > > hardware firmware for programming FPGA Program Logic, loading > > > firmware and checking card healthy status. > > > > > > - programming FW > > > - The base FW is downloaded onto the flash of the card. > > > - The APU FW is downloaded once after a POR (power on reset). > > > - Reloading the MgmtPF driver will not change any existing hardware. > > > > > > - programming FPGA hardware binaries - PL Data > > > - using fpga framework ops to support re-programing FPGA > > > - the re-programming request will be initiated from the existing UserPF > > > driver only, and the MgmtPF driver load the matched PL Data after > > > receiving request from the communication channel. The matching PL > > > > I think this is not the way the FPGA generic framework should do. A FPGA > > region user (your userPF driver) should not also be the reprogram requester. > > The user driver cannot deal with the unexpected HW change if it happens. > > Maybe after reprogramming, the user driver cannot match the device > > anymore, and if user driver is still working on it, crash. > > One thing to clarify. The current design is: > > The userPF driver is the only requester. The mgmtPF has no uAPI to reprogram > the FPGA. > > > > > > The expected behavior is, the FPGA region removes user devices (thus > > detaches user drivers), does reprogramming, re-enumerates/rescans and > > matches new devices with new drivers. And I think that's what Nava is > > working on. > > > > Nava's work is different than our current design, our current design is: > > the separate userPF driver will detach all services before requesting to the > mgmtPF to program the FPGA, and after the programming is done, the userPF > will re-enumerate/rescan the matching new devices. That's not align with the Device-Driver Model, A device driver should not re-enumerate/rescan a matching device. > > The mgmtPF is a util driver which is responsible for communicating with the > mgmtPF PCIe bar resources. > > > > BTW, AFAICS the expected flow is easier to implement for of-fpga-region, > > but harder for PCI devices. But I think that's the right direction and > > should try to work it out. > > Could I recap the suggested design if I understand that correctly... > > You are thinking that the mgmtPF (aka. versal-pci) driver should have a uAPI This should be the unified fpga-region class uAPI. > to trigger the FPGA re-programing; and using Nava's callback ops to detach > the separate userPF driver; after re-programing is done, re-attch the userPF No need to detach a specific driver, remove all devices in the fpga-region. I imagine this could also be a generic flow for all PCI/e based FPGA cards. Thanks, Yilun > driver and allow the userPF driver re-enumerate all to match the new > hardware. > > I think my understanding is correct, it is doable. > > As long as we can keep our userPF driver as separate driver, the code change > won't be too big. > > > > > Thanks, > > Yilun