[RFC] generic PCI DMA master framework

Thierry Reding <thierry.reding@xxxxxxxxxxxxxxxxx> · Mon, 1 Mar 2010 11:07:33 +0100

Hi,

We use a design that incorporates a PCIe switch into an FPGA. Behind the
switch are a number of PCI-to-PCI (P2P) bridges, each with a corresponding
endpoint, like shown below.

                            +--------------+
                            | Root Complex |
        Host                +--------------+
        ============================|============================
        FPGA                  +-----------+
                              | PCIe port |
                              +-----------+
                                    |
                     +--------------+--------------+
                     |              |              |
                  +-----+        +-----+        +-----+
                  | P2P |        | P2P |        | P2P |
                  +-----+        +-----+        +-----+
                     |              |              |
               +----------+    +----------+   +----------+
               | Endpoint |    | Endpoint |   | Endpoint |
               +----------+    +----------+   +----------+

This setup works very well, except for bulk transfers to or from individual
endpoints because the FPGA cores often do not support any kind of bus
mastering. The FPGA cores, at least those we use, do not even natively support
PCI. These cores are interconnected using the WISHBONE interface[1]. We
connect the PCI port to the individual WISHBONE cores using a special
PCI-to-WISHBONE bridge, translating PCI accesses to WISHBONE cycles.

In order to fix the problem for bulk transfers we've been thinking about
implementing a sort of generic PCI DMA mastering framework. This framework
consists of two parts: one or more DMA masters within the PCI hierarchy that
can access PCI endpoints as well as system RAM and some kernel driver
infrastructure to control these DMA masters.

For FPGA cores that do not support DMA transfers natively, their driver can
now use this framework to initiate bulk transfers to or from system RAM or
even to or from another core. The individual cores no longer need any
mastering capabilities.

In practice, setting up such transfers would look something like this: an
endpoint driver queries the PCI DMA framework, passing to it the source
(and/or target?) memory region of future DMA transfers. The framework will
then lookup a matching DMA master and pass a handle to it back to the driver,
which can then use that handle to queue new transfers. Drivers for DMA
controllers register the masters with the framework to make the functionality
available to devices mapped within a specific memory region.

In our case the logical place for the DMA master would be within the P2P
bridges because they intrinsically know about the memory window behind them
already.

To avoid duplication, perhaps this could somehow be integrated with the
existing dmaengine API. Though I am not sure about how to arrange for the
additional restrictions for specific memory windows.

This is all still a little sketchy, but I've wanted to ask for comments or
opinions first before we get into implementing something which may be totally
unusable or inherently broken to begin with.

Thierry

[1]: http://www.opencores.org/opencores,wishbone
Attachment:
signature.asc

Description: Digital signature