Re: [PATCH v4 09/17] iommufd: Data structure to provide IOVA to PFN mapping

Jason Gunthorpe <jgg@xxxxxxxxxx> · Tue, 15 Nov 2022 11:05:15 -0400

On Tue, Nov 15, 2022 at 03:13:56AM +0000, Tian, Kevin wrote:
> > From: Jason Gunthorpe <jgg@xxxxxxxxxx>
> > Sent: Tuesday, November 15, 2022 2:44 AM
> > 
> > On Mon, Nov 14, 2022 at 07:28:47AM +0000, Tian, Kevin wrote:
> > > > From: Jason Gunthorpe <jgg@xxxxxxxxxx>
> > > > Sent: Tuesday, November 8, 2022 8:49 AM
> > > >
> > > > +
> > > > +/*
> > > > + * Automatically find a block of IOVA that is not being used and not
> > reserved.
> > > > + * Does not return a 0 IOVA even if it is valid.
> > >
> > > what is the problem with 0? should this be documented in uAPI?
> > 
> > 0 is commonly used as an errant value for uninitialized things. We
> > don't automatically map it into a process mm because it can cause
> > security problems if we don't trap a bogus 0/NULL pointer reference.
> > 
> > The same logic applies here too, the allocator should not return 0 to
> > reserve it as an unmapped IOVA page to catch bugs.
> 
> CPU doesn't reference IOVA. Where do such bugs exist?

SW is always buggy and SW programs the DMA address, so it could leave
a 0 behind or something during the programming.

> > I don't think it needs to be documented
> 
> this again causes a subtle difference between automatic allocation
> and fixed iova. If we really think address 0 is something related
> to bug, then why is it allowed with fixed iova?

Because fixed can do anything up to the limits of the HW. It is like
mmp, where MAP_FIXED can allocate 0s as well, but automatic allocation
will not.

> > > > +		if (!__alloc_iova_check_used(&allowed_span, length,
> > > > +					     iova_alignment, page_offset))
> > > > +			continue;
> > > > +
> > > > +		interval_tree_for_each_span(&area_span, &iopt->area_itree,
> > > > +					    allowed_span.start_used,
> > > > +					    allowed_span.last_used) {
> > > > +			if (!__alloc_iova_check_hole(&area_span, length,
> > > > +						     iova_alignment,
> > > > +						     page_offset))
> > > > +				continue;
> > > > +
> > > > +			interval_tree_for_each_span(&reserved_span,
> > > > +						    &iopt->reserved_itree,
> > > > +						    area_span.start_used,
> > > > +						    area_span.last_used) {
> > > > +				if (!__alloc_iova_check_hole(
> > > > +					    &reserved_span, length,
> > > > +					    iova_alignment, page_offset))
> > > > +					continue;
> > >
> > > this could be simplified by double span.
> > 
> > It is subtly not compatible, the double span looks for used areas.
> > This is looking for a used area in the allowed_itree, a hole in the
> > area_itree, and a hole in the reserved_itree.
> 
> the inner two loops can be replaced by double span, since both
> are skipping used areas.

The 2nd loop is looking for a used on allowed and the 3rd loop is
looking for a hole in reserved. To fix it we'd have to invert allowed
to work like reserved - which complicates the uAPI code.

> > > > +	if (iopt->disable_large_pages)
> > > > +		new_iova_alignment = PAGE_SIZE;
> > > > +	else
> > > > +		new_iova_alignment = 1;
> > >
> > > I didn't understand why we start searching alignment from a
> > > smaller value when large pages is enabled. what is the
> > > connection here?
> > 
> > 'disable_large_pages' is a tiny bit misnamed, what it really does is
> > ensure that every iommu_map call is exactly PAGE_SIZE, not more (large
> > pages) and not less (what this is protecting against).
> > 
> > So if a domain has less than PAGE_SIZE we upgrade to
> > PAGE_SIZE. Otherwise we allow using the lowest possible alignment.
> > 
> > This allows userspace to always work in PAGE_SIZE units without fear
> > of problems, eg with sub-page-size units becoming weird or something.
> 
> above are good stuff in a comment.

This is the comment:

/*
 * This is part of the VFIO compatibility support for VFIO_TYPE1_IOMMU. That
 * mode permits splitting a mapped area up, and then one of the splits is
 * unmapped. Doing this normally would cause us to violate our invariant of
 * pairing map/unmap. Thus, to support old VFIO compatibility disable support
 * for batching consecutive PFNs. All PFNs mapped into the iommu are done in
 * PAGE_SIZE units, not larger or smaller.
 */
static int batch_iommu_map_small(struct iommu_domain *domain,
				 unsigned long iova, phys_addr_t paddr,
				 size_t size, int prot)

Thanks,
Jason