Re: [PATCH] PCI: tegra: Do not allocate MSI target memory

Thierry Reding <treding@xxxxxxxxxx> · Mon, 4 Mar 2019 09:37:00 +0100

On Mon, Mar 04, 2019 at 11:08:54AM +0530, Vidya Sagar wrote:
> 
> 
> On 3/2/2019 7:52 PM, Lucas Stach wrote:
> > Am Samstag, den 02.03.2019, 08:20 +0530 schrieb Vidya Sagar:
> > > On 3/1/2019 8:26 PM, Lucas Stach wrote:
> > > > Am Freitag, den 01.03.2019, 08:45 +0530 schrieb Vidya Sagar:
> > > > > On 3/1/2019 12:32 AM, Lucas Stach wrote:
> > > > > > Am Donnerstag, den 28.02.2019, 20:30 +0530 schrieb Vidya Sagar:
> > > > > > > The PCI host bridge found on Tegra SoCs doesn't require the MSI target
> > > > > > > address to be backed by physical system memory. Writes are intercepted
> > > > > > > within the controller and never make it to the memory pointed to.
> > > > > > > 
> > > > > > > Since no actual system memory is required, remove the allocation of a
> > > > > > > single page and hardcode the MSI target address with a special address
> > > > > > > on a per-SoC basis. Ideally this would be an address to an MMIO memory
> > > > > > > region (such as where the controller's register are located). However,
> > > > > > > those addresses don't work reliably across all Tegra generations. The
> > > > > > > only set of addresses that work consistently are those that point to
> > > > > > > external memory.
> > > > > > > 
> > > > > > > This is not ideal, since those addresses could technically be used for
> > > > > > > DMA and hence be confusing. However, the first page of external memory
> > > > > > > is unlikely to be used and special enough to avoid confusion.
> > > > > > So you are trading a slight memory waste of a single page against a
> > > > > > sporadic (and probably hard to debug) DMA failure if any device happens
> > > > > > to initiate DMA to the first page of physical memory? That does not
> > > > > > sound like a good deal...
> > > > > > 
> > > > > > Also why would the first page of external memory be unlikely to be
> > > > > > used?
> > > > > > 
> > > > > > Regards,
> > > > > > Lucas
> > > > > We are not wasting single page of memory here and if any device's DMA is
> > > > > trying to access it, it will still go through. Its just that we are using that
> > > > > same address for MSI (note that MSI writes don't go beyond PCIe IP as they get
> > > > > decoded at PCIe IP level itself and only an interrupt
> > > > > goes to CPU) and that might be a bit confusing as same address is used
> > > > > as normal memory as well as MSI target address. Since there can never be any
> > > > > issue with this would you suggest to remove the last paragraph from commit
> > > > > description?
> > > > How does the core distinguish between a normal DMA memory write and a
> > > > MSI? If I remember the PCIe spec correctly, there aren't any
> > > > differences between the two besides the target address.
> > > > 
> > > > So if you now set a non-reserved region of memory to decode as a MSI at
> > > > the PCIe host controller level, wouldn't this lead to normal DMA
> > > > transactions to this address being wrongfully turned into an MSI and
> > > > the write not reaching the targeted location?
> > > > 
> > > > Regards,
> > > > Lucas
> > > You are correct that core cannot distinguish between a normal DMA memory and
> > > MSI. In that case, the only way I see is to alloc memory using
> > > dma_alloc_coherent()
> > > and use the IOVA as the MSI target address. That way, a page gets
> > > reserved (in a way wasted
> > > also as the MSI writes don't really make it to RAM) and there won't be
> > > any address
> > > overlaps with normal DMA writes. I'll push a patch for it.
> > At that point it's no longer different from the current code, which
> > simply reserves a single page of memory and uses its address as the MSI
> > target address.
> But, I think there is still one issue with the current code base. Since what
> is used as
> MSI target address is the physical address equivalent of the memory page
> being reserved,
> there is a chance that some IOVA might become equal to this physical address
> there by
> making PCIe IP to raise an MSI interrupt instead of forwarding the write to
> memory.
> Do you agree with this?

Yeah, I think we need to make sure that the MSI target address is in the
same address space as other memory allocated for PCI devices. If we
continue to use __get_free_pages() that means we'd have to manually set
up an IOMMU domain for the IOVA mapping of the MSI target address. That
is pretty tedious and would probably also conflict with code in PCI
device drivers that are presumably going to allocate memory using the
DMA API and hence could get a different domain or IOVA space than the
PCI controller itself.

There's also still an issue with 32-bit vs. 64-bit addressing, if I'm
not mistaken. I think we had meant to address that separately, but it
looks like we never did. Or perhaps I'm forgetting what conclusion we
had come to regarding that?

The original proposal to use a fixed address outside of system memory
could've fixed the conflicting address issue and the 32-bit vs. 64-bit
addressing problem, but it also wouldn't have worked for IOVA addresses
because the MSI target address could easily clash with an address from
the PCI IOVA space.

> > So in conclusion this change should just be dropped.

I think we need to do something, even if this change is perhaps not the
best approach in retrospect.

Perhaps as a first step we could try to solve the 32-bit vs. 64-bit
addressing issue in a separate patch and then think about the SMMU case
some more and then resolve that separately.

I'm not exactly sure how SMMU works for PCI on Tegra, but I thought it
was using a single IOVA space (or ASID in Tegra-speak) for anything PCI.
What I don't understand is how that works in conjunction with the memory
resource assignment code. If we enable SMMU for PCI, don't all of the
BARs become invalid? I suppose if only memory accesses upwards of the
PCI controller get translated through the SMMU, then the memory resource
assignments might stay valid. But we would also have to ensure that IOVA
allocations for PCI don't intersect with the PCI address ranges from
which BARs are allocated.

I've been working on a set of patches that address a similar problem for
display. The problem that I'm seeing is that if we enable IOMMU-backed
DMA API for display, then the IOMMU translation for display controllers
gets enabled very early on, a long time before the driver gets a chance
to program the hardware. The result is that if the bootloader has left
the display controller enabled, it will now read from memory that has no
IOVA mapping (the bootloader usually doesn't set up the SMMU). This is a
bit of a bug without DMA/IOMMU as well because the display controller is
reading memory that the kernel will at some point claim and write to.

There are two things we need to fix that: one is to make sure that the
kernel doesn't write to the memory currently being scanned out and the
other is to properly set up the SMMU before it is enabled, but without
requiring the kernel driver to be loaded.

The first I solved using reserved-memory regions. See the documentation
in Documentation/devicetree/bindings/reserved-memory/ for background on
what these are. The idea is to use a reserved memory region to prevent
the kernel from handing the framebuffer memory to the buddy allocator.
The second problem can be fixed by implementing "reserved regions"
support in the SMMU driver. I have an implementation for that ready and
it works fine in conjunction with some bootloader patches that make sure
the correct reserved memory regions are set up when the kernel boots.

While not exactly the same, I think the issue for PCI is conceptually
similar. It doesn't make much sense to add a reserved region for the PCI
apertures because they aren't part of system memory in the first place.
I'm not sure it would hurt, but it's possible that the reserved memory
code is not prepared to deal with memory regions outside of system
memory. However, I think what we want to do is establish reserved
regions in the PCI ASID that represents the PCI apertures. That way we
avoid ever programming PCI devices with DMA addresses that would point
back into PCI memory.

Thierry
Attachment:
signature.asc

Description: PGP signature