Hi, We use a design that incorporates a PCIe switch into an FPGA. Behind the switch are a number of PCI-to-PCI (P2P) bridges, each with a corresponding endpoint, like shown below. +--------------+ | Root Complex | Host +--------------+ ============================|============================ FPGA +-----------+ | PCIe port | +-----------+ | +--------------+--------------+ | | | +-----+ +-----+ +-----+ | P2P | | P2P | | P2P | +-----+ +-----+ +-----+ | | | +----------+ +----------+ +----------+ | Endpoint | | Endpoint | | Endpoint | +----------+ +----------+ +----------+ This setup works very well, except for bulk transfers to or from individual endpoints because the FPGA cores often do not support any kind of bus mastering. The FPGA cores, at least those we use, do not even natively support PCI. These cores are interconnected using the WISHBONE interface[1]. We connect the PCI port to the individual WISHBONE cores using a special PCI-to-WISHBONE bridge, translating PCI accesses to WISHBONE cycles. In order to fix the problem for bulk transfers we've been thinking about implementing a sort of generic PCI DMA mastering framework. This framework consists of two parts: one or more DMA masters within the PCI hierarchy that can access PCI endpoints as well as system RAM and some kernel driver infrastructure to control these DMA masters. For FPGA cores that do not support DMA transfers natively, their driver can now use this framework to initiate bulk transfers to or from system RAM or even to or from another core. The individual cores no longer need any mastering capabilities. In practice, setting up such transfers would look something like this: an endpoint driver queries the PCI DMA framework, passing to it the source (and/or target?) memory region of future DMA transfers. The framework will then lookup a matching DMA master and pass a handle to it back to the driver, which can then use that handle to queue new transfers. Drivers for DMA controllers register the masters with the framework to make the functionality available to devices mapped within a specific memory region. In our case the logical place for the DMA master would be within the P2P bridges because they intrinsically know about the memory window behind them already. To avoid duplication, perhaps this could somehow be integrated with the existing dmaengine API. Though I am not sure about how to arrange for the additional restrictions for specific memory windows. This is all still a little sketchy, but I've wanted to ask for comments or opinions first before we get into implementing something which may be totally unusable or inherently broken to begin with. Thierry [1]: http://www.opencores.org/opencores,wishbone
Attachment:
signature.asc
Description: Digital signature