On 2022-09-09 16:16, Janne Grunau wrote:
On 2022-09-09 11:56:32 +0100, Robin Murphy wrote:
On 2022-09-05 18:08, Thierry Reding wrote:
From: Thierry Reding <treding@xxxxxxxxxx>
This is an implementation that IOMMU drivers can use to obtain reserved
memory regions from a device tree node. It uses the reserved-memory DT
bindings to find the regions associated with a given device. If these
regions are marked accordingly, identity mappings will be created for
them in the IOMMU domain that the devices will be attached to.
Cc: Frank Rowand <frowand.list@xxxxxxxxx>
Cc: devicetree@xxxxxxxxxxxxxxx
Reviewed-by: Rob Herring <robh@xxxxxxxxxx>
Signed-off-by: Thierry Reding <treding@xxxxxxxxxx>
---
Changes in v8:
- cleanup set-but-unused variables
Changes in v6:
- remove reference to now unused dt-bindings/reserved-memory.h include
Changes in v5:
- update for new "iommu-addresses" device tree bindings
Changes in v4:
- fix build failure on !CONFIG_OF_ADDRESS
Changes in v3:
- change "active" property to identity mapping flag that is part of the
memory region specifier (as defined by #memory-region-cells) to allow
per-reference flags to be used
Changes in v2:
- use "active" property to determine whether direct mappings are needed
drivers/iommu/of_iommu.c | 85 ++++++++++++++++++++++++++++++++++++++++
include/linux/of_iommu.h | 8 ++++
2 files changed, 93 insertions(+)
diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
index 5696314ae69e..6617096ad15f 100644
--- a/drivers/iommu/of_iommu.c
+++ b/drivers/iommu/of_iommu.c
@@ -11,6 +11,7 @@
#include <linux/module.h>
#include <linux/msi.h>
#include <linux/of.h>
+#include <linux/of_address.h>
#include <linux/of_iommu.h>
#include <linux/of_pci.h>
#include <linux/pci.h>
@@ -172,3 +173,87 @@ const struct iommu_ops *of_iommu_configure(struct device *dev,
return ops;
}
+
+/**
+ * of_iommu_get_resv_regions - reserved region driver helper for device tree
+ * @dev: device for which to get reserved regions
+ * @list: reserved region list
+ *
+ * IOMMU drivers can use this to implement their .get_resv_regions() callback
+ * for memory regions attached to a device tree node. See the reserved-memory
+ * device tree bindings on how to use these:
+ *
+ * Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
+ */
+void of_iommu_get_resv_regions(struct device *dev, struct list_head *list)
+{
+#if IS_ENABLED(CONFIG_OF_ADDRESS)
+ struct of_phandle_iterator it;
+ int err;
+
+ of_for_each_phandle(&it, err, dev->of_node, "memory-region", NULL, 0) {
+ struct iommu_resv_region *region;
+ struct resource res;
+ const __be32 *maps;
+ int size;
+
+ memset(&res, 0, sizeof(res));
+
+ /*
+ * The "reg" property is optional and can be omitted by reserved-memory regions
+ * that represent reservations in the IOVA space, which are regions that should
+ * not be mapped.
+ */
+ if (of_find_property(it.node, "reg", NULL)) {
+ err = of_address_to_resource(it.node, 0, &res);
+ if (err < 0) {
+ dev_err(dev, "failed to parse memory region %pOF: %d\n",
+ it.node, err);
+ continue;
+ }
+ }
+
+ maps = of_get_property(it.node, "iommu-addresses", &size);
+ if (maps) {
Nit: "if (!maps) continue;" and save some indentation.
+ const __be32 *end = maps + size / sizeof(__be32);
+ struct device_node *np;
+ u32 phandle;
+ int na, ns;
+
+ while (maps < end) {
+ phys_addr_t start;
+ size_t length;
+
+ phandle = be32_to_cpup(maps++);
+ np = of_find_node_by_phandle(phandle);
+ na = of_n_addr_cells(np);
+ ns = of_n_size_cells(np);
+
+ start = of_translate_dma_address(np, maps);
+ length = of_read_number(maps + na, ns);
Nit: these could go inside the if condition.
+
+ if (np == dev->of_node) {
+ int prot = IOMMU_READ | IOMMU_WRITE;
would it be reasonable to infer IOMMU_CACHE here from "dma-coherrent"?
Hmm, good point, it really depends on what the device wants - even if it
is coherent, we don't necessarily know how it intends to use any
particular reservation; allowing MSI writes or similar to allocate in a
system cache wouldn't go too well, for instance.
Empirically, making the wrong assumption in this area can lead to people
preferring to spend a year being unfairly rude on Twitter instead of
providing timely productive feedback :(
+ enum iommu_resv_type type;
+
+ /*
+ * IOMMU regions without an associated physical region
+ * cannot be mapped and are simply reservations.
+ */
+ if (res.end > res.start)
+ type = IOMMU_RESV_DIRECT_RELAXABLE;
There may be reservations that have a PA but are expected to live beyond
boot-time handover, like device firmware or a shared-memory communication
buffer which the kernel driver can't reconfigure, or some kind of black hole
that needs a PA because it's also "no-map" for the CPUs. Those are not
relaxable. Might it be reasonable to expect to infer this from the
compatible, or should we have an additional explicit flag to distinguish
ephemeral boot-time mappings from permanent ones?
From which compatible? the device's? That's not possible for the display
processor on apple silicon machines. They will carry mappings for their
firmware's data and heap and the boot framebuffer. That are however not
direct mappings so it doesn't apply directly to
IOMMU_RESV_DIRECT_RELAXABLE. There are probably other ways we could
identify the boot framebuffer but so far I'm not even convinced that we
want to reuse it.
I mean the compatibles of the reserved-memory nodes themselves.
Semantically that should be the perfect way to identify their individual
purposes, but in practice does mean that we might end up with a big
match table to say that e.g. "apple,framebuffer" is relaxable and wants
IOMMU_CACHE, and so on, and old kernels still have to have some default
behaviour for new things they don't understand. It's not *too* far off
the general situation for drivers, so if we expect these to be fairly
rare, maybe that's OK? Particularly if it might be feasible to have some
semi-generic compatibles to encapsulate the most common behaviours? I'm
inclined to defer to Rob and Frank on this one.
Furthermore, we should only use IOMMU_RESV_DIRECT (in either form) if
start
and length actually match res here; if not then we should warn that we're
reserving the IOVA space but not actually honouring the specified mapping
(we'd need a new resv_region type for arbitrary translations).
This would be needed for Asahi's dcp driver. So far I worked around this
problem by direct reserved-memory parsing from apple-dart.
Probably best to postpone the extension until it's needed.
Indeed, as long as the binding is good from the outset, we can fill in
support for the more specialised cases as and when.
Thanks,
Robin.