On 2024-05-31 11:02 am, Andre Przywara wrote:
On Fri, 31 May 2024 09:37:02 +0100
Robin Murphy <robin.murphy@xxxxxxx> wrote:
Hi Robin,
On 2024-05-31 12:37 am, Andre Przywara wrote:
The Allwinner IOMMU is a strict 32-bit device, with the page table root
pointer as well as both level's page tables and also the target addresses
all required to be below 4GB.
The Allwinner H6 SoC only supports 32-bit worth of physical addresses
anyway, so this isn't a problem so far, but the H616 and later SoCs extend
the PA space beyond 32 bit to accommodate more DRAM.
To make sure we stay within the 32-bit PA range required by the IOMMU,
force the memory for the page tables to come from below 4GB. by using
allocations with the DMA32 flag.
Uh-oh... what about the output addresses in sun50i_mk_pte()? Limiting
its own accesses is OK, but if the IOMMU isn't capable of *mapping* any
valid PA for its clients, we can't easily support that.
Right, that's indeed a problem. I was hoping that the DMA32 address limit
would somehow be enforced by the IOMMU master devices, so they would never
issue addresses above 4GB to the IOMMU in the first place.
Would this work if all those devices use a 32-bit DMA mask? Some of those
devices might have that limit anyways, but those video devices are not
my expertise, so I don't know much details.
It's fine to have a 32-bit *input* to the IOMMU - that only affects the
IOVA allocation, wherein iommu-dma already considers both the IOMMU
domain geometry and the client device's DMA mask to ensure it gives back
a usable DMA address. Indeed, plumbing 32-bit devices into a system with
a >32-bit PA space is one of the common reasons to have an IOMMU, but it
assumes that IOMMU translations are capable of targeting the entire
larger PA range.
IIUC, atm the incoming PA would be masked down to 32-bit, I guess we should have a
WARN_ONCE() there when this happens?
The 32-bit limit would only affect boards with exactly 4GB of DRAM (the
DRAM controller limit), and it only affects the last GB then, so using
DMA32 wouldn't be a terrible limitation, I think.
The problem is when a client driver does, say, a dma_map_single() of
some kmalloced buffer which it doesn't control, and which already
happens to be at a >32-bit PA; iommu-dma can't make that work for you.
At best we'd hope the iommu_map() returns an error and terminally fails
the whole DMA mapping operation, at worst the IOMMU driver silently
truncates the PA, maps the wrong memory, and all hell breaks loose :(
Thanks,
Robin.
TBH, I picked this up from Jernej, so have to refer to him for further
details.
Cheers,
Andre
P.S. I agree that a 32-bit only IOMMU sounds somewhat stup^Wweird, but
that's what we have. Maybe we would use it just for the VE only then, where
it's really helpful to provide the illusion of large physically contiguous
buffers.
Thanks,
Robin.
Signed-off-by: Andre Przywara <andre.przywara@xxxxxxx>
---
drivers/iommu/sun50i-iommu.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/iommu/sun50i-iommu.c b/drivers/iommu/sun50i-iommu.c
index dd3f07384624c..c3244db5ac02f 100644
--- a/drivers/iommu/sun50i-iommu.c
+++ b/drivers/iommu/sun50i-iommu.c
@@ -682,7 +682,8 @@ sun50i_iommu_domain_alloc_paging(struct device *dev)
if (!sun50i_domain)
return NULL;
- sun50i_domain->dt = iommu_alloc_pages(GFP_KERNEL, get_order(DT_SIZE));
+ sun50i_domain->dt = iommu_alloc_pages(GFP_KERNEL | GFP_DMA32,
+ get_order(DT_SIZE));
if (!sun50i_domain->dt)
goto err_free_domain;
@@ -997,7 +998,7 @@ static int sun50i_iommu_probe(struct platform_device *pdev)
iommu->pt_pool = kmem_cache_create(dev_name(&pdev->dev),
PT_SIZE, PT_SIZE,
- SLAB_HWCACHE_ALIGN,
+ SLAB_HWCACHE_ALIGN | SLAB_CACHE_DMA32,
NULL);
if (!iommu->pt_pool)
return -ENOMEM;