Re: [PATCH v6 0/6] Improve PCI memory mapping API

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10/23/24 05:47, Bjorn Helgaas wrote:
> On Tue, Oct 22, 2024 at 10:51:53AM +0900, Damien Le Moal wrote:
>> On 10/22/24 07:19, Bjorn Helgaas wrote:
>>> On Sat, Oct 12, 2024 at 08:32:40PM +0900, Damien Le Moal wrote:
>>>> This series introduces the new functions pci_epc_mem_map() and
>>>> pci_epc_mem_unmap() to improve handling of the PCI address mapping
>>>> alignment constraints of endpoint controllers in a controller
>>>> independent manner.
>>>
>>> Hi Damien, I know this is obvious to everybody except me, but who
>>> uses this mapping?  A driver running on the endpoint that does
>>> MMIO?  DMA (Memory Reads or Writes that target an endpoint BAR)?
>>> I'm still trying to wrap my head around the whole endpoint driver
>>> model.
>>
>> The mapping API is for mmio or DMA using endpoint controller memory
>> mapped to a host PCI address range. It is not for BARs. BARs setup
>> does not use the same API and has not changed with these patches.
>>
>> BARs can still be accessed on the endpoint (within the EPF driver)
>> with regular READ_ONCE()/WRITE_ONCE() once they are set. But any
>> data transfer between the PCI RC host and the EPF driver on the EP
>> host that use mmio or DMA generic channel (memory copy offload
>> channel) needs the new mapping API. DMA transfers that can be done
>> using dedicated DMA rx/tx channels associated with the endpoint
>> controller do not need to use this API as the mapping to the host
>> PCI address space is automatically handled by the DMA driver.
> 
> Sorry I'm dense.  I'm really not used to thinking in the endpoint
> point of view.  Correct me where I go off the rails:
> 
>   - This code (pci_epc_mem_map()) runs on an endpoint, basically as
>     part of some endpoint firmware, right?

Not sure what you mean by "firmware" here. pci_epc_mem_map() is intended for use
by a PCI endpoint function driver on the EP side, nothing else.

>   - On the endpoint's PCIe side, it receives memory read/write TLPs?

pci_epc_mem_map() does not receive anything itself. It only allocates local EP
controller memory from outbound ATU windows and map that to a PCI address region
of the host (RC side). Here "map" means programming the ATU using a correctly
aligned PCI address that satisfies the EP PCI controller alignment constraints.

The memory read/write TLPs will happen once the EP function driver access the
mapped memory with memcpy_fromio/toio() (or use generic memory copy offload DMA
channel).

>   - These TLPs would be generated elsewhere in the PCIe fabric, e.g.,
>     by a driver on the host doing MMIO to the endpoint, or possibly
>     another endpoint doing peer-to-peer DMA.

MMIO or DMA, yes.

> 
>   - Mem read/write TLPs are routed by address, and the endpoint
>     accepts them if the address matches one of its BARs.

Yes, but pci_epc_mem_map() is not for use for BARs on the EP side. BARs setup on
EP PCI controllers use inbound ATU windows. pci_epc_mem_map() programs outbound
ATU windows. BARs setup use pci_epf_alloc_space() + pci_epc_set_bar() to setup a
BAR.

> 
>   - This is a little different from a Root Complex, which would
>     basically treat reads/writes to anything outside the Root Port
>     windows as incoming DMA headed to physical memory.
> 
>   - A Root Complex would use the TLP address (the PCI bus address)
>     directly as a CPU physical memory address unless the TLP address
>     is translated by an IOMMU.
> 
>   - For the endpoint, you want the BAR to be an aperture to physical
>     memory in the address space of the SoC driving the endpoint.

Yes, and as said above this is done with pci_epf_alloc_space() +
pci_epc_set_bar(), not pci_epc_mem_map().

>   - The SoC physical memory address may be different from the PCI but
>     address in the TLP, and pci_epc_mem_map() is the way to account
>     for this?

pci_epc_mem_map() is essentially a call to pci_epc_mem_alloc_addr() +
pci_epc_map_addr(). pci_epc_mem_alloc_addr() allocates memory from the EP PCI
controller outbound windows and pci_epc_map_addr() programs an EP PCI controller
outbound ATU entry to map that memory to some PCI address region of the host (RC).

pci_epc_mem_map() was created because the current epc API does not hide any of
the EP PCI controller PCI address alignment constraints for programming outbound
ATU entries. This means that using directly pci_epc_mem_alloc_addr() +
pci_epc_map_addr(), things may or may not work (often the latter) depending on
the PCI address region (start address and its size) that is to be mapped and
accessed by the endpoint function driver.

Most EP PCI controllers at least require a PCI address to be aligned to some
fixed boundary (e.g. 64K for the RK3588 SoC on the Rock5B board). Even such
alignment boundary/mask is not exposed through the API (epc_features->align is
for BARs and should not be used for outbound ATU programming). Worse yet, some
EP PCI controllers use the lower bits of a local memory window address range as
the lower bits of the RC PCI address range when generating TLPs (e.g. RK3399 and
Cadence EP controller). For these EP PCI controllers, the programming of
outbound ATU entries for mapping a RC PCI address region thus endup depending on
the start PCI address of the region *and* the region size (as the size drives
the number of lower bits of address that will change over the entire region).

The above API issues where essentially unsolvable from a regular endpoint driver
correctly coded to be EP controller independent. I could not get my prototype
nvme endpoint function driver to work without introducing pci_epc_mem_map()
(nvme does not require any special alignment of command buffers, so on an nvme
EPF driver we end up having to deal with essentially random PCI addresses that
have no special alignment).

pci_epc_mem_map() relies on the new ->align_addr() EP controller operation to
get information about the start address and the size to use for mapping to local
memory a PCI address region. The size given is used for pci_epc_mem_alloc_addr()
and the start address and size given are used with pci_epc_map_addr(). This
generates a correct mapping regardless of the PCI address and size given.
And for the actual data access by the EP function driver, pci_epc_mem_map()
gives the address within the mapping that correspond to the PCI address that we
wanted mapped.

Et voila, problem solved :)

> 
>   - IOMMU translations from PCI to CPU physical address space are
>     pretty arbitrary and needn't be contiguous on the CPU side.
> 
>   - pci_epc_mem_map() sets up a conceptually similar PCI to CPU
>     address space translation, but it's much simpler because it
>     basically applies just a constant offset?


-- 
Damien Le Moal
Western Digital Research




[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux