On 1/26/25 02:32, Xu Yilun wrote:
Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
On Tue, Dec 10, 2024 at 10:37:30AM -0800, Yidong Zhang wrote:
AMD Versal based PCIe card, including V70, is designed for AI inference
efficiency and is tuned for video analytics and natural language processing
applications.
The driver architecture:
+---------+ Communication +---------+ Remote +-----+------+
| | Channel | | Queue | | |
| User PF | <============> | Mgmt PF | <=======>| FW | FPGA |
+---------+ +---------+ +-----+------+
PL Data base FW
APU FW
PL Data (copy)
- PL (FPGA Program Logic)
- FW (Firmware)
There are 2 separate drivers from the original XRT[1] design.
- UserPF driver
- MgmtPF driver
The new AMD versal-pci driver will replace the MgmtPF driver for Versal
PCIe card.
The XRT[1] is already open-sourced. It includes solution of runtime for
many different type of PCIe Based cards. It also provides utilities for
managing and programming the devices.
The AMD versal-pci stands for AMD Versal brand PCIe device management
driver. This driver provides the following functionalities:
- module and PCI device initialization
this driver will attach to specific device id of V70 card;
the driver will initialize itself based on bar resources for
- communication channel:
a hardware message service between mgmt PF and user PF
- remote queue:
a hardware queue based ring buffer service between mgmt PF and PCIe
hardware firmware for programming FPGA Program Logic, loading
firmware and checking card healthy status.
- programming FW
- The base FW is downloaded onto the flash of the card.
- The APU FW is downloaded once after a POR (power on reset).
- Reloading the MgmtPF driver will not change any existing hardware.
- programming FPGA hardware binaries - PL Data
- using fpga framework ops to support re-programing FPGA
- the re-programming request will be initiated from the existing UserPF
driver only, and the MgmtPF driver load the matched PL Data after
receiving request from the communication channel. The matching PL
I think this is not the way the FPGA generic framework should do. A FPGA
region user (your userPF driver) should not also be the reprogram requester.
The user driver cannot deal with the unexpected HW change if it happens.
Maybe after reprogramming, the user driver cannot match the device
anymore, and if user driver is still working on it, crash.
One thing to clarify. The current design is:
The userPF driver is the only requester. The mgmtPF has no uAPI to
reprogram the FPGA.
The expected behavior is, the FPGA region removes user devices (thus
detaches user drivers), does reprogramming, re-enumerates/rescans and
matches new devices with new drivers. And I think that's what Nava is
working on.
Nava's work is different than our current design, our current design is:
the separate userPF driver will detach all services before requesting to
the mgmtPF to program the FPGA, and after the programming is done, the
userPF will re-enumerate/rescan the matching new devices.
The mgmtPF is a util driver which is responsible for communicating with
the mgmtPF PCIe bar resources.
BTW, AFAICS the expected flow is easier to implement for of-fpga-region,
but harder for PCI devices. But I think that's the right direction and
should try to work it out.
Could I recap the suggested design if I understand that correctly...
You are thinking that the mgmtPF (aka. versal-pci) driver should have a
uAPI to trigger the FPGA re-programing; and using Nava's callback ops to
detach the separate userPF driver; after re-programing is done, re-attch
the userPF driver and allow the userPF driver re-enumerate all to match
the new hardware.
I think my understanding is correct, it is doable.
As long as we can keep our userPF driver as separate driver, the code
change won't be too big.
Thanks,
Yilun