> From: Jason Gunthorpe <jgg@xxxxxxxxxx> > Sent: Tuesday, November 15, 2022 11:05 PM > > On Tue, Nov 15, 2022 at 03:13:56AM +0000, Tian, Kevin wrote: > > > From: Jason Gunthorpe <jgg@xxxxxxxxxx> > > > Sent: Tuesday, November 15, 2022 2:44 AM > > > > > > On Mon, Nov 14, 2022 at 07:28:47AM +0000, Tian, Kevin wrote: > > > > > From: Jason Gunthorpe <jgg@xxxxxxxxxx> > > > > > Sent: Tuesday, November 8, 2022 8:49 AM > > > > > > > > > > + > > > > > +/* > > > > > + * Automatically find a block of IOVA that is not being used and not > > > reserved. > > > > > + * Does not return a 0 IOVA even if it is valid. > > > > > > > > what is the problem with 0? should this be documented in uAPI? > > > > > > 0 is commonly used as an errant value for uninitialized things. We > > > don't automatically map it into a process mm because it can cause > > > security problems if we don't trap a bogus 0/NULL pointer reference. > > > > > > The same logic applies here too, the allocator should not return 0 to > > > reserve it as an unmapped IOVA page to catch bugs. > > > > CPU doesn't reference IOVA. Where do such bugs exist? > > SW is always buggy and SW programs the DMA address, so it could leave > a 0 behind or something during the programming. address 0 is never a bug in DMA to IOVA. if it is, it will be out of the aperture or in the reserved IOVA list. DMA API is also a auto-iova scheme from driver p.o.v while it doesn't impose any restriction on address 0. > > > > > + if (!__alloc_iova_check_used(&allowed_span, length, > > > > > + iova_alignment, > page_offset)) > > > > > + continue; > > > > > + > > > > > + interval_tree_for_each_span(&area_span, &iopt- > >area_itree, > > > > > + allowed_span.start_used, > > > > > + allowed_span.last_used) { > > > > > + if (!__alloc_iova_check_hole(&area_span, > length, > > > > > + iova_alignment, > > > > > + page_offset)) > > > > > + continue; > > > > > + > > > > > + > interval_tree_for_each_span(&reserved_span, > > > > > + &iopt- > >reserved_itree, > > > > > + > area_span.start_used, > > > > > + > area_span.last_used) { > > > > > + if (!__alloc_iova_check_hole( > > > > > + &reserved_span, length, > > > > > + iova_alignment, > page_offset)) > > > > > + continue; > > > > > > > > this could be simplified by double span. > > > > > > It is subtly not compatible, the double span looks for used areas. > > > This is looking for a used area in the allowed_itree, a hole in the > > > area_itree, and a hole in the reserved_itree. > > > > the inner two loops can be replaced by double span, since both > > are skipping used areas. > > The 2nd loop is looking for a used on allowed and the 3rd loop is > looking for a hole in reserved. To fix it we'd have to invert allowed > to work like reserved - which complicates the uAPI code. The 1st loop finds an allowed range which can hold requested length The 2nd loop finds an *unused* hole in the allowed range The 3rd loop further looks for a hole in reserved. last two both try to find a hole. > > > > > > + if (iopt->disable_large_pages) > > > > > + new_iova_alignment = PAGE_SIZE; > > > > > + else > > > > > + new_iova_alignment = 1; > > > > > > > > I didn't understand why we start searching alignment from a > > > > smaller value when large pages is enabled. what is the > > > > connection here? > > > > > > 'disable_large_pages' is a tiny bit misnamed, what it really does is > > > ensure that every iommu_map call is exactly PAGE_SIZE, not more (large > > > pages) and not less (what this is protecting against). > > > > > > So if a domain has less than PAGE_SIZE we upgrade to > > > PAGE_SIZE. Otherwise we allow using the lowest possible alignment. > > > > > > This allows userspace to always work in PAGE_SIZE units without fear > > > of problems, eg with sub-page-size units becoming weird or something. > > > > above are good stuff in a comment. > > This is the comment: > > /* > * This is part of the VFIO compatibility support for VFIO_TYPE1_IOMMU. > That > * mode permits splitting a mapped area up, and then one of the splits is > * unmapped. Doing this normally would cause us to violate our invariant of > * pairing map/unmap. Thus, to support old VFIO compatibility disable > support > * for batching consecutive PFNs. All PFNs mapped into the iommu are done > in > * PAGE_SIZE units, not larger or smaller. > */ > static int batch_iommu_map_small(struct iommu_domain *domain, > unsigned long iova, phys_addr_t paddr, > size_t size, int prot) > I meant a comment in iopt_calculate_iova_alignment().