Re: [PATCH v3 9/9] PCI: endpoint: Set prefetch when allocating memory for 64-bit BARs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Mar 17, 2024, at 12:54, Niklas Cassel wrote:
> On Fri, Mar 15, 2024 at 06:29:52PM +0100, Arnd Bergmann wrote:
>> On Fri, Mar 15, 2024, at 07:44, Manivannan Sadhasivam wrote:
>> 
>> I think there are three separate questions here when talking about
>> a scenario where a PCI master accesses memory behind a PCI endpoint:
>
> I think the question is if the PCI epf-core, which runs on the endpoint
> side, and which calls dma_alloc_coherent() to allocate backing memory for
> a BAR, can set/mark the Prefetchable bit for the BAR (if we also set/mark
> the BAR as a 64-bit BAR).
>
> The PCIe 6.0 spec, 7.5.1.2.1 Base Address Registers (Offset 10h - 24h),
> states:
> "Any device that has a range that behaves like normal memory should mark
> the range as prefetchable. A linear frame buffer in a graphics device is
> an example of a range that should be marked prefetchable."
>
> Does not backing memory allocated for a specific BAR using
> dma_alloc_coherent() on the EP side behave like normal memory from the
> host's point of view?

I'm not sure I follow this logic: If the device wants the
buffer to act like "normal memory", then it can be marked
as prefetchable and mapped into the host as write-combining,
but I think in this case you *don't* want it to be coherent
on the endpoint side either but use a streaming mapping with
explicit cache management instead.

Conversely, if the endpoint side requires a coherent mapping,
then I think you will want a strictly ordered (non-wc,
non-frefetchable) mapping on the host side as well.

It would be helpful to have actual endpoint function drivers
in the kernel rather than just the test drivers to see what type
of serialization you actually want for best performance on
both sides.

Can you give a specific example of an endpoint that you are
actually interested in, maybe just one that we have a host-side
device driver for in tree?

> On the host side, this will mean that the host driver sees the
> Prefetchable bit, and as according to:
> https://docs.kernel.org/driver-api/device-io.html
> The host might map the BAR using ioremap_wc().
>
> Looking specifically at drivers/misc/pci_endpoint_test.c, it maps the
> BARs using pci_ioremap_bar():
> https://elixir.bootlin.com/linux/v6.8/source/drivers/pci/pci.c#L252
> which will not map it using ioremap_wc().
> (But the code we have in the PCI epf-core must of course work with host
> side drivers other than pci_endpoint_test.c as well.)

It is to some degree architecture specific here. On powerpc
and i386 with MTTRs, any prefetchable BAR will be mapped as
write-combining IIRC, but on arm and arm64 it only depends on
whether the host side driver uses ioremap() or ioremap_wc().

>> - The local CPU on the endpoint side may access the same buffer as
>>   the endpoint device. On low-end SoCs the DMA from the PCI
>>   endpoint is not coherent with the CPU caches, so the CPU may
>
> I don't follow. When doing DMA *from* the endpoint, then the DMA HW
> on the EP side will read or write data to a buffer allocated on the
> host side (most likely using dma_alloc_coherent()), but what does
> that got to do with how the EP configures the BARs that it exposes?

I meant doing DMA to the memory of the endpoint side, not the
host side. DMA to the host side memory is completely separate
from this question.

     Arnd




[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux