Re: [RFC PATCH v2 12/22] iommufd: Allow mapping from guest_memfd

Alexey Kardashevskiy <aik@xxxxxxx> · Wed, 19 Feb 2025 10:35:28 +1100

On 19/2/25 01:16, Jason Gunthorpe wrote:
On Tue, Feb 18, 2025 at 10:09:59PM +1100, Alexey Kardashevskiy wrote:
CoCo VMs get their private memory allocated from guest_memfd
("gmemfd") which is a KVM facility similar to memfd.
At the moment gmemfds cannot mmap() so the usual GUP API does
not work on these as expected.

Use the existing IOMMU_IOAS_MAP_FILE API to allow mapping from
fd + offset. Detect the gmemfd case in pfn_reader_user_pin() and
simplified mapping.

The long term plan is to ditch this workaround and follow
the usual memfd path.

How is that possible though?

dunno, things evolve over years and converge somehow :)

+static struct folio *guest_memfd_get_pfn(struct file *file, unsigned long index,
+					 unsigned long *pfn, int *max_order)
+{
+	struct folio *folio;
+	int ret = 0;
+
+	folio = filemap_grab_folio(file_inode(file)->i_mapping, index);
+
+	if (IS_ERR(folio))
+		return folio;
+
+	if (folio_test_hwpoison(folio)) {
+		folio_unlock(folio);
+		folio_put(folio);
+		return ERR_PTR(-EHWPOISON);
+	}
+
+	*pfn = folio_pfn(folio) + (index & (folio_nr_pages(folio) - 1));
+	if (!max_order)
+		goto unlock_exit;
+
+	/* Refs for unpin_user_page_range_dirty_lock->gup_put_folio(FOLL_PIN) */
+	ret = folio_add_pins(folio, 1);
+	folio_put(folio); /* Drop ref from filemap_grab_folio */
+
+unlock_exit:
+	folio_unlock(folio);
+	if (ret)
+		folio = ERR_PTR(ret);
+
+	return folio;
+}

Connecting iommufd to guestmemfd through the FD is broadly the right
idea, but I'm not sure this matches the design of guestmemfd regarding
pinnability. IIRC they were adamant that the pages would not be
pinned..

uff I thought it was about "not mapped" rather than "non pinned".

folio_add_pins() just prevents the folio from being freed, it doesn't
prevent the guestmemfd code from messing with the filemap.

You should separate this from the rest of the series and discuss it
directly with the guestmemfd maintainers.

Alright, thanks for the suggestion.

As I understood it the requirement here is to have some kind of
invalidation callback so that iommufd can drop mappings,

Since shared<->private conversion is an ioctl() (kvm/gmemfd) so it is 
ioctl() for iommufd then too. Oh well.

but I don't
really know and AFAIK AMD is special in wanting private pages mapped
to the hypervisor iommu..

With in-place conversion, we could map the entire guest once in the HV 
IOMMU and control the Cbit via the guest's IOMMU table (when available). 
Thanks,

--
Alexey