Re: RMRR device on non-Intel platform

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2023-04-21 13:45, Jason Gunthorpe wrote:
On Fri, Apr 21, 2023 at 01:29:46PM +0100, Robin Murphy wrote:

Can you clarify why something other than IOMMU_RESV_SW_MSI would be
needed?

We need iommufd to setup a 1:1 map for the reserved space.

So, of the reserved spaces we have these:

	/* Memory regions which must be mapped 1:1 at all times */
	IOMMU_RESV_DIRECT,

            Block iommufd

	/*
	 * Memory regions which are advertised to be 1:1 but are
	 * commonly considered relaxable in some conditions,
	 * for instance in device assignment use case (USB, Graphics)
	 */
	IOMMU_RESV_DIRECT_RELAXABLE,

            iommufd ignores this one

	/* Arbitrary "never map this or give it to a device" address ranges */
	IOMMU_RESV_RESERVED,

	   iommufd prevents using this IOVA range

	/* Hardware MSI region (untranslated) */
	IOMMU_RESV_MSI,

	   iommufd treats this the same as IOMMU_RESV_RESERVED

	/* Software-managed MSI translation window */
	IOMMU_RESV_SW_MSI,

	   iommufd treats this the same as IOMMU_RESV_RESERVED, also
	   it passes the start to iommu_get_msi_cookie() which
	   eventually maps something, but not 1:1.

I don't think it is a compatible change for IOMMU_RESV_SW_MSI to also
mean 1:1 map?

Bleh, my bad, the nested MSI stuff is right on the limit of the amount of horribleness I can hold in my head at once even at the best of times, let alone when my head is still half-full of PMUs.

I think a slightly more considered and slightly less wrong version of that idea is to mark it as IOMMU_RESV_MSI, and special-case direct-mapping those on Arm (I believe it would technically be benign to do on x86 too, but might annoy people with its pointlessness). However...

On baremetal we have no idea what the platform put under that
hardcoded address?

On VM we don't use the iommu_get_msi_cookie() flow because the GIC in
the VM pretends it doesn't have an ITS page?  (did I get that right?)

No, that can't be right - PCIe devices have to support MSI or MSI-X, and many of them won't support INTx at all, so if the guest wants to use interrupts in general it must surely need to believe it has some kind of MSI controller, which for practical purposes in this context means an ITS. That was the next thing I started wondering after the above - if the aim is to direct-map the host's SW_MSI region to effectively pass through the S2 MSI cookie, but you have the same Linux SMMU driver in the guest, isn't that guest driver still going to add a conflicting SW_MSI region for the same IOVA and confuse things?

Ideally for nesting, the VMM would just tell us the IPA of where it's going to claim the given device's associated MSI doorbell is, we map that to the real underlying address at S2, then the guest can use its S1 cookie as normal if it wants to, and the host doesn't have to rewrite addresses either way. The set_dev_data thing starts to look tempting for this too... Given that the nesting usage model inherently constrains the VMM's options for emulating the IOMMU, would it be unreasonable to make our lives a lot easier with some similar constraints around interrupts, and just not attempt to support the full gamut of "emulate any kind of IRQ with any other kind of IRQ" irqfd hilarity?

MSI regions already represent "safe" direct mappings, either as an inherent
property of the hardware, or with an actual mapping maintained by software.
Also RELAXABLE is meant to imply that it is only needed until a driver takes
over the device, which at face value doesn't make much sense for interrupts.

I used "relxable" to suggest it is safe for userspace.

I know, but the subtlety is the reason *why* it's safe for userspace. Namely that a VFIO driver has bound and reset (or at least taken control of) the device, and thus it is assumed to no longer be doing whatever the boot firmware left it doing, therefore the reserved region is assumed to no longer be relevant, and from then on the requirement to preserve it can be relaxed.

We'll still need to set this when the default domain type is identity too -
see the diff I posted (the other parts below I merely implied).

Right, I missed that!

I suggest like this to avoid the double loop:

@@ -1037,9 +1037,6 @@ static int iommu_create_device_direct_mappings(struct iom>
         unsigned long pg_size;
         int ret = 0;
- if (!iommu_is_dma_domain(domain))
-               return 0;
-
         BUG_ON(!domain->pgsize_bitmap);
pg_size = 1UL << __ffs(domain->pgsize_bitmap);

But then you realise that you also need to juggle this around since identity domains aren't required to have a valid pgsize_bitmap either, give up on the idea and go straight to writing a dedicated loop as the clearer and tidier option because hey this is hardly a fast path anyway. At least, you do if you're me :)

Cheers,
Robin.

@@ -1052,13 +1049,18 @@ static int iommu_create_device_direct_mappings(struct i>
                 dma_addr_t start, end, addr;
                 size_t map_size = 0;
- start = ALIGN(entry->start, pg_size);
-               end   = ALIGN(entry->start + entry->length, pg_size);
-
                 if (entry->type != IOMMU_RESV_DIRECT &&
                     entry->type != IOMMU_RESV_DIRECT_RELAXABLE)
                         continue;
+ if (entry->type == IOMMU_RESV_DIRECT)
+                       dev->iommu->requires_direct = 1;
+
+               if (!iommu_is_dma_domain(domain))
+                       continue;
+
+               start = ALIGN(entry->start, pg_size);
+               end   = ALIGN(entry->start + entry->length, pg_size);
                 for (addr = start; addr <= end; addr += pg_size) {
                         phys_addr_t phys_addr;

Jason



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux