Re: [RFC PATCH v5 05/15] KVM: guest_memfd: Folio mappability states and functions that manage their transition

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Fuad Tabba <tabba@xxxxxxxxxx> writes:

This question should not block merging of this series since performance
can be improved in a separate series:

> <snip>
>
> +
> +/*
> + * Marks the range [start, end) as mappable by both the host and the guest.
> + * Usually called when guest shares memory with the host.
> + */
> +static int gmem_set_mappable(struct inode *inode, pgoff_t start, pgoff_t end)
> +{
> +	struct xarray *mappable_offsets = &kvm_gmem_private(inode)->mappable_offsets;
> +	void *xval = xa_mk_value(KVM_GMEM_ALL_MAPPABLE);
> +	pgoff_t i;
> +	int r = 0;
> +
> +	filemap_invalidate_lock(inode->i_mapping);
> +	for (i = start; i < end; i++) {

Were any alternative data structures considered, or does anyone have
suggestions for alternatives? Doing xa_store() in a loop here will take
a long time for large ranges.

I looked into the following:

Option 1: (preferred) Maple trees

Maple tree has a nice API, though it would be better if it can combine
ranges that have the same value.

I will have to dig into performance, but I'm assuming that even large
ranges are stored in a few nodes so this would be faster than iterating
over indices in an xarray.

void explore_maple_tree(void)
{
	DEFINE_MTREE(mt);

	mt_init_flags(&mt, MT_FLAGS_LOCK_EXTERN | MT_FLAGS_USE_RCU);

	mtree_store_range(&mt, 0, 16, xa_mk_value(0x20), GFP_KERNEL);
	mtree_store_range(&mt, 8, 24, xa_mk_value(0x32), GFP_KERNEL);
	mtree_store_range(&mt, 5, 10, xa_mk_value(0x32), GFP_KERNEL);

	{
		void *entry;
		MA_STATE(mas, &mt, 0, 0);

		mas_for_each(&mas, entry, ULONG_MAX) {
			pr_err("[%ld, %ld]: 0x%lx\n", mas.index, mas.last, xa_to_value(entry));
		}
	}

	mtree_destroy(&mt);
}

stdout:

[0, 4]: 0x20
[5, 10]: 0x32
[11, 24]: 0x32

Option 2: Multi-index xarray

The API is more complex than maple tree's, and IIUC multi-index xarrays
are not generalizable to any range, so the range can't be 8 1G pages + 1
4K page for example. The size of the range has to be a power of 2 that
is greater than 4K.

Using multi-index xarrays would mean computing order to store
multi-index entries. This can be computed from the size of the range to
be added, but is an additional source of errors.

Option 3: Interval tree, which is built on top of red-black trees

The API is set up at a lower level. A macro is used to define interval
trees, the user has to deal with nodes in the tree directly and
separately define functions to override sub-ranges in larger ranges.

> +		r = xa_err(xa_store(mappable_offsets, i, xval, GFP_KERNEL));
> +		if (r)
> +			break;
> +	}
> +	filemap_invalidate_unlock(inode->i_mapping);
> +
> +	return r;
> +}
> +
> +/*
> + * Marks the range [start, end) as not mappable by the host. If the host doesn't
> + * have any references to a particular folio, then that folio is marked as
> + * mappable by the guest.
> + *
> + * However, if the host still has references to the folio, then the folio is
> + * marked and not mappable by anyone. Marking it is not mappable allows it to
> + * drain all references from the host, and to ensure that the hypervisor does
> + * not transition the folio to private, since the host still might access it.
> + *
> + * Usually called when guest unshares memory with the host.
> + */
> +static int gmem_clear_mappable(struct inode *inode, pgoff_t start, pgoff_t end)
> +{
> +	struct xarray *mappable_offsets = &kvm_gmem_private(inode)->mappable_offsets;
> +	void *xval_guest = xa_mk_value(KVM_GMEM_GUEST_MAPPABLE);
> +	void *xval_none = xa_mk_value(KVM_GMEM_NONE_MAPPABLE);
> +	pgoff_t i;
> +	int r = 0;
> +
> +	filemap_invalidate_lock(inode->i_mapping);
> +	for (i = start; i < end; i++) {
> +		struct folio *folio;
> +		int refcount = 0;
> +
> +		folio = filemap_lock_folio(inode->i_mapping, i);
> +		if (!IS_ERR(folio)) {
> +			refcount = folio_ref_count(folio);
> +		} else {
> +			r = PTR_ERR(folio);
> +			if (WARN_ON_ONCE(r != -ENOENT))
> +				break;
> +
> +			folio = NULL;
> +		}
> +
> +		/* +1 references are expected because of filemap_lock_folio(). */
> +		if (folio && refcount > folio_nr_pages(folio) + 1) {
> +			/*
> +			 * Outstanding references, the folio cannot be faulted
> +			 * in by anyone until they're dropped.
> +			 */
> +			r = xa_err(xa_store(mappable_offsets, i, xval_none, GFP_KERNEL));
> +		} else {
> +			/*
> +			 * No outstanding references. Transition the folio to
> +			 * guest mappable immediately.
> +			 */
> +			r = xa_err(xa_store(mappable_offsets, i, xval_guest, GFP_KERNEL));
> +		}
> +
> +		if (folio) {
> +			folio_unlock(folio);
> +			folio_put(folio);
> +		}
> +
> +		if (WARN_ON_ONCE(r))
> +			break;
> +	}
> +	filemap_invalidate_unlock(inode->i_mapping);
> +
> +	return r;
> +}
> +
>
> <snip>




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux