Re: [PATCH v4 09/17] iommufd: Data structure to provide IOVA to PFN mapping

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Nov 14, 2022 at 07:28:47AM +0000, Tian, Kevin wrote:
> > From: Jason Gunthorpe <jgg@xxxxxxxxxx>
> > Sent: Tuesday, November 8, 2022 8:49 AM
> > 
> > +
> > +/*
> > + * Automatically find a block of IOVA that is not being used and not reserved.
> > + * Does not return a 0 IOVA even if it is valid.
> 
> what is the problem with 0? should this be documented in uAPI?

0 is commonly used as an errant value for uninitialized things. We
don't automatically map it into a process mm because it can cause
security problems if we don't trap a bogus 0/NULL pointer reference.

The same logic applies here too, the allocator should not return 0 to
reserve it as an unmapped IOVA page to catch bugs.

I don't think it needs to be documented

> > +	interval_tree_for_each_span(&allowed_span, &iopt->allowed_itree,
> > +				    PAGE_SIZE, ULONG_MAX - PAGE_SIZE) {
> > +		if (RB_EMPTY_ROOT(&iopt->allowed_itree.rb_root)) {
> > +			allowed_span.start_used = PAGE_SIZE;
> > +			allowed_span.last_used = ULONG_MAX - PAGE_SIZE;
> > +			allowed_span.is_hole = false;
> > +		}
> 
> statically initialize it when iopt is created?

allowed_span is a stack variable?

> > +		if (!__alloc_iova_check_used(&allowed_span, length,
> > +					     iova_alignment, page_offset))
> > +			continue;
> > +
> > +		interval_tree_for_each_span(&area_span, &iopt->area_itree,
> > +					    allowed_span.start_used,
> > +					    allowed_span.last_used) {
> > +			if (!__alloc_iova_check_hole(&area_span, length,
> > +						     iova_alignment,
> > +						     page_offset))
> > +				continue;
> > +
> > +			interval_tree_for_each_span(&reserved_span,
> > +						    &iopt->reserved_itree,
> > +						    area_span.start_used,
> > +						    area_span.last_used) {
> > +				if (!__alloc_iova_check_hole(
> > +					    &reserved_span, length,
> > +					    iova_alignment, page_offset))
> > +					continue;
> 
> this could be simplified by double span.

It is subtly not compatible, the double span looks for used areas.
This is looking for a used area in the allowed_itree, a hole in the
area_itree, and a hole in the reserved_itree.

I don't think IOVA allocation should be a fast path so it is not worth
alot of effort to micro-optimize this.

> > +static int iopt_check_iova(struct io_pagetable *iopt, unsigned long iova,
> > +			   unsigned long length)
> > +{
> > +	unsigned long last;
> > +
> > +	lockdep_assert_held(&iopt->iova_rwsem);
> > +
> > +	if ((iova & (iopt->iova_alignment - 1)))
> > +		return -EINVAL;
> > +
> > +	if (check_add_overflow(iova, length - 1, &last))
> > +		return -EOVERFLOW;
> > +
> > +	/* No reserved IOVA intersects the range */
> > +	if (iopt_reserved_iter_first(iopt, iova, last))
> > +		return -ENOENT;
> 
> vfio type1 returns -EINVAL
> 
> > +
> > +	/* Check that there is not already a mapping in the range */
> > +	if (iopt_area_iter_first(iopt, iova, last))
> > +		return -EADDRINUSE;
> 
> vfio type1 returns -EEXIST

Hum I guess we can change them here, it is a bit annoying for the test suite
though.

> > +static int iopt_unmap_iova_range(struct io_pagetable *iopt, unsigned long
> > start,
> > +				 unsigned long end, unsigned long
> 
> s/end/last/
> 
> > +int iopt_unmap_iova(struct io_pagetable *iopt, unsigned long iova,
> > +		    unsigned long length, unsigned long *unmapped)
> > +{
> > +	unsigned long iova_end;
> 
> s/iova_end/iova_last/

yep
 
> > +static int iopt_calculate_iova_alignment(struct io_pagetable *iopt)
> > +{
> > +	unsigned long new_iova_alignment;
> > +	struct iommufd_access *access;
> > +	struct iommu_domain *domain;
> > +	unsigned long index;
> > +
> > +	lockdep_assert_held_write(&iopt->iova_rwsem);
> > +	lockdep_assert_held(&iopt->domains_rwsem);
> > +
> > +	if (iopt->disable_large_pages)
> > +		new_iova_alignment = PAGE_SIZE;
> > +	else
> > +		new_iova_alignment = 1;
> 
> I didn't understand why we start searching alignment from a
> smaller value when large pages is enabled. what is the
> connection here?

'disable_large_pages' is a tiny bit misnamed, what it really does is
ensure that every iommu_map call is exactly PAGE_SIZE, not more (large
pages) and not less (what this is protecting against).

So if a domain has less than PAGE_SIZE we upgrade to
PAGE_SIZE. Otherwise we allow using the lowest possible alignment.

This allows userspace to always work in PAGE_SIZE units without fear
of problems, eg with sub-page-size units becoming weird or something.

> > +	interval_tree_remove(&area->node, &iopt->area_itree);
> > +	rc = iopt_insert_area(iopt, lhs, area->pages, start_iova,
> > +			      iopt_area_start_byte(area, start_iova),
> > +			      (new_start - 1) - start_iova + 1,
> > +			      area->iommu_prot);
> > +	if (WARN_ON(rc))
> > +		goto err_insert;
> > +
> > +	rc = iopt_insert_area(iopt, rhs, area->pages, new_start,
> > +			      iopt_area_start_byte(area, new_start),
> > +			      last_iova - new_start + 1, area->iommu_prot);
> > +	if (WARN_ON(rc))
> > +		goto err_remove_lhs;
> > +
> > +	lhs->storage_domain = area->storage_domain;
> > +	lhs->num_accesses = area->num_accesses;
> > +	lhs->pages = area->pages;
> > +	rhs->storage_domain = area->storage_domain;
> > +	rhs->num_accesses = area->num_accesses;
> 
> if an access only spans one side, is it correct to have both split sides
> keep the access number?

Er, this is acatually completely broken, woops. A removal of an access
will trigger a WARN_ON since the access_itree element is very likely
no longer correct.

Ah.. So the only use case here is unmapping and you can't unmap
something that has an access established, except in some pathalogical
case where the access does not intersect with what is being mapped.

There is no way to tell which iopt_pages_access are connected to which
areas, so without spending some memory this can't be fixed up. I think
it is not a real issue as mdev plus this ancient VFIO interface is
probably not something that exists in the real world..

+       /*
+        * Splitting is not permitted if an access exists, we don't track enough
+        * information to split existing accesses.
+        */
+       if (area->num_accesses) {
+               rc = -EINVAL;
+               goto err_unlock;
+       }
+
@@ -1041,10 +1050,8 @@ static int iopt_area_split(struct iopt_area *area, unsigned long iova)
                goto err_remove_lhs;
 
        lhs->storage_domain = area->storage_domain;
-       lhs->num_accesses = area->num_accesses;
        lhs->pages = area->pages;
        rhs->storage_domain = area->storage_domain;
-       rhs->num_accesses = area->num_accesses;
        rhs->pages = area->pages;
        kref_get(&rhs->pages->kref);
        kfree(area);

Thanks,
Jason



[Index of Archives]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]

  Powered by Linux