Re: [PATCH V2 vfio 05/11] vfio: Add an IOVA bitmap support

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 14 Jul 2022 11:12:45 +0300
Yishai Hadas <yishaih@xxxxxxxxxx> wrote:

> From: Joao Martins <joao.m.martins@xxxxxxxxxx>
> 
> The new facility adds a bunch of wrappers that abstract how an IOVA
> range is represented in a bitmap that is granulated by a given
> page_size. So it translates all the lifting of dealing with user
> pointers into its corresponding kernel addresses backing said user
> memory into doing finally the bitmap ops to change various bits.
> 
> The formula for the bitmap is:
> 
>    data[(iova / page_size) / 64] & (1ULL << (iova % 64))
> 
> Where 64 is the number of bits in a unsigned long (depending on arch)
> 
> An example usage of these helpers for a given @iova, @page_size, @length
> and __user @data:
> 
> 	iova_bitmap_init(&iter.dirty, iova, __ffs(page_size));
> 	ret = iova_bitmap_iter_init(&iter, iova, length, data);

Why are these separate functions given this use case?

> 	if (ret)
> 		return -ENOMEM;
> 
> 	for (; !iova_bitmap_iter_done(&iter);
> 	     iova_bitmap_iter_advance(&iter)) {
> 		ret = iova_bitmap_iter_get(&iter);
> 		if (ret)
> 			break;
> 		if (dirty)
> 			iova_bitmap_set(iova_bitmap_iova(&iter),
> 					iova_bitmap_iova_length(&iter),
> 					&iter.dirty);
> 
> 		iova_bitmap_iter_put(&iter);
> 
> 		if (ret)
> 			break;

This break is unreachable.

> 	}
> 
> 	iova_bitmap_iter_free(&iter);
> 
> The facility is intended to be used for user bitmaps representing
> dirtied IOVAs by IOMMU (via IOMMUFD) and PCI Devices (via vfio-pci).
> 
> Signed-off-by: Joao Martins <joao.m.martins@xxxxxxxxxx>
> Signed-off-by: Yishai Hadas <yishaih@xxxxxxxxxx>
> ---
>  drivers/vfio/Makefile       |   6 +-
>  drivers/vfio/iova_bitmap.c  | 164 ++++++++++++++++++++++++++++++++++++
>  include/linux/iova_bitmap.h |  46 ++++++++++
>  3 files changed, 214 insertions(+), 2 deletions(-)
>  create mode 100644 drivers/vfio/iova_bitmap.c
>  create mode 100644 include/linux/iova_bitmap.h
> 
> diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile
> index 1a32357592e3..1d6cad32d366 100644
> --- a/drivers/vfio/Makefile
> +++ b/drivers/vfio/Makefile
> @@ -1,9 +1,11 @@
>  # SPDX-License-Identifier: GPL-2.0
>  vfio_virqfd-y := virqfd.o
>  
> -vfio-y += vfio_main.o
> -
>  obj-$(CONFIG_VFIO) += vfio.o
> +
> +vfio-y := vfio_main.o \
> +          iova_bitmap.o \
> +
>  obj-$(CONFIG_VFIO_VIRQFD) += vfio_virqfd.o
>  obj-$(CONFIG_VFIO_IOMMU_TYPE1) += vfio_iommu_type1.o
>  obj-$(CONFIG_VFIO_IOMMU_SPAPR_TCE) += vfio_iommu_spapr_tce.o
> diff --git a/drivers/vfio/iova_bitmap.c b/drivers/vfio/iova_bitmap.c
> new file mode 100644
> index 000000000000..9ad1533a6aec
> --- /dev/null
> +++ b/drivers/vfio/iova_bitmap.c
> @@ -0,0 +1,164 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (c) 2022, Oracle and/or its affiliates.
> + * Copyright (c) 2021-2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved
> + */
> +
> +#include <linux/iova_bitmap.h>
> +
> +static unsigned long iova_bitmap_iova_to_index(struct iova_bitmap_iter *iter,
> +					       unsigned long iova_length)

If we have an iova-to-index function, why do we pass it a length?  That
seems to be conflating the use cases where the caller is trying to
determine the last index for a range with the actual implementation of
this helper.

> +{
> +	unsigned long pgsize = 1 << iter->dirty.pgshift;
> +
> +	return DIV_ROUND_UP(iova_length, BITS_PER_TYPE(*iter->data) * pgsize);

ROUND_UP here doesn't make sense to me and is not symmetric with the
below index-to-iova function.  For example an iova of 0x1000 give me an
index of 1, but index of 1 gives me an iova of 0x40000.  Does this code
work??

> +}
> +
> +static unsigned long iova_bitmap_index_to_iova(struct iova_bitmap_iter *iter,
> +					       unsigned long index)
> +{
> +	unsigned long pgshift = iter->dirty.pgshift;
> +
> +	return (index * sizeof(*iter->data) * BITS_PER_BYTE) << pgshift;
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Isn't that BITS_PER_TYPE(*iter->data), just as in the previous function?

> +}
> +
> +static unsigned long iova_bitmap_iter_left(struct iova_bitmap_iter *iter)

I think this is trying to find "remaining" whereas "left" can be
confused with a direction.

> +{
> +	unsigned long left = iter->count - iter->offset;
> +
> +	left = min_t(unsigned long, left,
> +		     (iter->dirty.npages << PAGE_SHIFT) / sizeof(*iter->data));

Ugh, dirty.npages is zero'd on bitmap init, allocated on get and left
with stale data on put.  This really needs some documentation/theory of
operation.

> +
> +	return left;
> +}
> +
> +/*
> + * Input argument of number of bits to bitmap_set() is unsigned integer, which
> + * further casts to signed integer for unaligned multi-bit operation,
> + * __bitmap_set().
> + * Then maximum bitmap size supported is 2^31 bits divided by 2^3 bits/byte,
> + * that is 2^28 (256 MB) which maps to 2^31 * 2^12 = 2^43 (8TB) on 4K page
> + * system.
> + */

This is all true and familiar, but what's it doing here?  The type1
code this comes from uses this to justify some #defines that are used
to sanitize input.  I see no such enforcement in this code.  The only
comment in this whole patch and it seems irrelevant.

> +int iova_bitmap_iter_init(struct iova_bitmap_iter *iter,
> +			  unsigned long iova, unsigned long length,
> +			  u64 __user *data)
> +{
> +	struct iova_bitmap *dirty = &iter->dirty;
> +
> +	iter->data = data;
> +	iter->offset = 0;
> +	iter->count = iova_bitmap_iova_to_index(iter, length);

If this works, it's because the DIV_ROUND_UP above accounted for what
should have been and index-to-count fixup here, ie. add one.

> +	iter->iova = iova;
> +	iter->length = length;
> +	dirty->pages = (struct page **)__get_free_page(GFP_KERNEL);
> +
> +	return !dirty->pages ? -ENOMEM : 0;
> +}
> +
> +void iova_bitmap_iter_free(struct iova_bitmap_iter *iter)
> +{
> +	struct iova_bitmap *dirty = &iter->dirty;
> +
> +	if (dirty->pages) {
> +		free_page((unsigned long)dirty->pages);
> +		dirty->pages = NULL;
> +	}
> +}
> +
> +bool iova_bitmap_iter_done(struct iova_bitmap_iter *iter)
> +{
> +	return iter->offset >= iter->count;
> +}
> +
> +unsigned long iova_bitmap_length(struct iova_bitmap_iter *iter)
> +{
> +	unsigned long max_iova = iter->dirty.iova + iter->length;
> +	unsigned long left = iova_bitmap_iter_left(iter);
> +	unsigned long iova = iova_bitmap_iova(iter);
> +
> +	left = iova_bitmap_index_to_iova(iter, left);

@left is first used for number of indexes and then for an iova range :(

> +	if (iova + left > max_iova)
> +		left -= ((iova + left) - max_iova);
> +
> +	return left;
> +}

IIUC, this is returning the iova free space in the bitmap, not the
length of the bitmap??

> +
> +unsigned long iova_bitmap_iova(struct iova_bitmap_iter *iter)
> +{
> +	unsigned long skip = iter->offset;
> +
> +	return iter->iova + iova_bitmap_index_to_iova(iter, skip);
> +}

It would help if this were defined above it's usage above.

> +
> +void iova_bitmap_iter_advance(struct iova_bitmap_iter *iter)
> +{
> +	unsigned long length = iova_bitmap_length(iter);
> +
> +	iter->offset += iova_bitmap_iova_to_index(iter, length);

Again, fudging an index count based on bogus index value.

> +}
> +
> +void iova_bitmap_iter_put(struct iova_bitmap_iter *iter)
> +{
> +	struct iova_bitmap *dirty = &iter->dirty;
> +
> +	if (dirty->npages)
> +		unpin_user_pages(dirty->pages, dirty->npages);

dirty->npages = 0;?

> +}
> +
> +int iova_bitmap_iter_get(struct iova_bitmap_iter *iter)
> +{
> +	struct iova_bitmap *dirty = &iter->dirty;
> +	unsigned long npages;
> +	u64 __user *addr;
> +	long ret;
> +
> +	npages = DIV_ROUND_UP((iter->count - iter->offset) *
> +			      sizeof(*iter->data), PAGE_SIZE);
> +	npages = min(npages,  PAGE_SIZE / sizeof(struct page *));
> +	addr = iter->data + iter->offset;
> +	ret = pin_user_pages_fast((unsigned long)addr, npages,
> +				  FOLL_WRITE, dirty->pages);
> +	if (ret <= 0)
> +		return ret;
> +
> +	dirty->npages = (unsigned long)ret;
> +	dirty->iova = iova_bitmap_iova(iter);
> +	dirty->start_offset = offset_in_page(addr);
> +	return 0;
> +}
> +
> +void iova_bitmap_init(struct iova_bitmap *bitmap,
> +		      unsigned long base, unsigned long pgshift)
> +{
> +	memset(bitmap, 0, sizeof(*bitmap));
> +	bitmap->iova = base;
> +	bitmap->pgshift = pgshift;
> +}
> +
> +unsigned int iova_bitmap_set(struct iova_bitmap *dirty,
> +			     unsigned long iova,
> +			     unsigned long length)
> +{
> +	unsigned long nbits, offset, start_offset, idx, size, *kaddr;
> +
> +	nbits = max(1UL, length >> dirty->pgshift);
> +	offset = (iova - dirty->iova) >> dirty->pgshift;
> +	idx = offset / (PAGE_SIZE * BITS_PER_BYTE);
> +	offset = offset % (PAGE_SIZE * BITS_PER_BYTE);
> +	start_offset = dirty->start_offset;
> +
> +	while (nbits > 0) {
> +		kaddr = kmap_local_page(dirty->pages[idx]) + start_offset;
> +		size = min(PAGE_SIZE * BITS_PER_BYTE - offset, nbits);
> +		bitmap_set(kaddr, offset, size);
> +		kunmap_local(kaddr - start_offset);
> +		start_offset = offset = 0;
> +		nbits -= size;
> +		idx++;
> +	}
> +
> +	return nbits;
> +}
> +EXPORT_SYMBOL_GPL(iova_bitmap_set);
> +
> diff --git a/include/linux/iova_bitmap.h b/include/linux/iova_bitmap.h
> new file mode 100644
> index 000000000000..c474c351634a
> --- /dev/null
> +++ b/include/linux/iova_bitmap.h
> @@ -0,0 +1,46 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Copyright (c) 2022, Oracle and/or its affiliates.
> + * Copyright (c) 2021-2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved
> + */
> +
> +#ifndef _IOVA_BITMAP_H_
> +#define _IOVA_BITMAP_H_
> +
> +#include <linux/highmem.h>
> +#include <linux/mm.h>
> +#include <linux/uio.h>
> +
> +struct iova_bitmap {
> +	unsigned long iova;
> +	unsigned long pgshift;
> +	unsigned long start_offset;
> +	unsigned long npages;
> +	struct page **pages;
> +};
> +
> +struct iova_bitmap_iter {
> +	struct iova_bitmap dirty;
> +	u64 __user *data;
> +	size_t offset;
> +	size_t count;
> +	unsigned long iova;
> +	unsigned long length;
> +};
> +
> +int iova_bitmap_iter_init(struct iova_bitmap_iter *iter, unsigned long iova,
> +			  unsigned long length, u64 __user *data);
> +void iova_bitmap_iter_free(struct iova_bitmap_iter *iter);
> +bool iova_bitmap_iter_done(struct iova_bitmap_iter *iter);
> +unsigned long iova_bitmap_length(struct iova_bitmap_iter *iter);
> +unsigned long iova_bitmap_iova(struct iova_bitmap_iter *iter);
> +void iova_bitmap_iter_advance(struct iova_bitmap_iter *iter);
> +int iova_bitmap_iter_get(struct iova_bitmap_iter *iter);
> +void iova_bitmap_iter_put(struct iova_bitmap_iter *iter);
> +void iova_bitmap_init(struct iova_bitmap *bitmap,
> +		      unsigned long base, unsigned long pgshift);
> +unsigned int iova_bitmap_set(struct iova_bitmap *dirty,
> +			     unsigned long iova,
> +			     unsigned long length);
> +
> +#endif

No relevant comments, no theory of operation.  I found this really
difficult to review and the page handling is still not clear to me.
I'm not willing to take on maintainership of this code under
drivers/vfio/ as is.  Thanks,

Alex




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux