> From: Jason Gunthorpe <jgg@xxxxxxxxxx> > Sent: Tuesday, November 15, 2022 2:44 AM > > On Mon, Nov 14, 2022 at 07:28:47AM +0000, Tian, Kevin wrote: > > > From: Jason Gunthorpe <jgg@xxxxxxxxxx> > > > Sent: Tuesday, November 8, 2022 8:49 AM > > > > > > + > > > +/* > > > + * Automatically find a block of IOVA that is not being used and not > reserved. > > > + * Does not return a 0 IOVA even if it is valid. > > > > what is the problem with 0? should this be documented in uAPI? > > 0 is commonly used as an errant value for uninitialized things. We > don't automatically map it into a process mm because it can cause > security problems if we don't trap a bogus 0/NULL pointer reference. > > The same logic applies here too, the allocator should not return 0 to > reserve it as an unmapped IOVA page to catch bugs. CPU doesn't reference IOVA. Where do such bugs exist? > > I don't think it needs to be documented this again causes a subtle difference between automatic allocation and fixed iova. If we really think address 0 is something related to bug, then why is it allowed with fixed iova? > > > > + if (!__alloc_iova_check_used(&allowed_span, length, > > > + iova_alignment, page_offset)) > > > + continue; > > > + > > > + interval_tree_for_each_span(&area_span, &iopt->area_itree, > > > + allowed_span.start_used, > > > + allowed_span.last_used) { > > > + if (!__alloc_iova_check_hole(&area_span, length, > > > + iova_alignment, > > > + page_offset)) > > > + continue; > > > + > > > + interval_tree_for_each_span(&reserved_span, > > > + &iopt->reserved_itree, > > > + area_span.start_used, > > > + area_span.last_used) { > > > + if (!__alloc_iova_check_hole( > > > + &reserved_span, length, > > > + iova_alignment, page_offset)) > > > + continue; > > > > this could be simplified by double span. > > It is subtly not compatible, the double span looks for used areas. > This is looking for a used area in the allowed_itree, a hole in the > area_itree, and a hole in the reserved_itree. the inner two loops can be replaced by double span, since both are skipping used areas. > > I don't think IOVA allocation should be a fast path so it is not worth > alot of effort to micro-optimize this. but I'm not insisting on changing them now. It's trivial. > > > + if (iopt->disable_large_pages) > > > + new_iova_alignment = PAGE_SIZE; > > > + else > > > + new_iova_alignment = 1; > > > > I didn't understand why we start searching alignment from a > > smaller value when large pages is enabled. what is the > > connection here? > > 'disable_large_pages' is a tiny bit misnamed, what it really does is > ensure that every iommu_map call is exactly PAGE_SIZE, not more (large > pages) and not less (what this is protecting against). > > So if a domain has less than PAGE_SIZE we upgrade to > PAGE_SIZE. Otherwise we allow using the lowest possible alignment. > > This allows userspace to always work in PAGE_SIZE units without fear > of problems, eg with sub-page-size units becoming weird or something. above are good stuff in a comment. > > > > + interval_tree_remove(&area->node, &iopt->area_itree); > > > + rc = iopt_insert_area(iopt, lhs, area->pages, start_iova, > > > + iopt_area_start_byte(area, start_iova), > > > + (new_start - 1) - start_iova + 1, > > > + area->iommu_prot); > > > + if (WARN_ON(rc)) > > > + goto err_insert; > > > + > > > + rc = iopt_insert_area(iopt, rhs, area->pages, new_start, > > > + iopt_area_start_byte(area, new_start), > > > + last_iova - new_start + 1, area->iommu_prot); > > > + if (WARN_ON(rc)) > > > + goto err_remove_lhs; > > > + > > > + lhs->storage_domain = area->storage_domain; > > > + lhs->num_accesses = area->num_accesses; > > > + lhs->pages = area->pages; > > > + rhs->storage_domain = area->storage_domain; > > > + rhs->num_accesses = area->num_accesses; > > > > if an access only spans one side, is it correct to have both split sides > > keep the access number? > > Er, this is acatually completely broken, woops. A removal of an access > will trigger a WARN_ON since the access_itree element is very likely > no longer correct. > > Ah.. So the only use case here is unmapping and you can't unmap > something that has an access established, except in some pathalogical > case where the access does not intersect with what is being mapped. > > There is no way to tell which iopt_pages_access are connected to which > areas, so without spending some memory this can't be fixed up. I think > it is not a real issue as mdev plus this ancient VFIO interface is > probably not something that exists in the real world.. > > + /* > + * Splitting is not permitted if an access exists, we don't track enough > + * information to split existing accesses. > + */ > + if (area->num_accesses) { > + rc = -EINVAL; > + goto err_unlock; > + } > + > @@ -1041,10 +1050,8 @@ static int iopt_area_split(struct iopt_area *area, > unsigned long iova) > goto err_remove_lhs; > > lhs->storage_domain = area->storage_domain; > - lhs->num_accesses = area->num_accesses; > lhs->pages = area->pages; > rhs->storage_domain = area->storage_domain; > - rhs->num_accesses = area->num_accesses; > rhs->pages = area->pages; > kref_get(&rhs->pages->kref); > kfree(area); > this change makes sense to me