On Tue, Nov 15, 2022 at 03:13:56AM +0000, Tian, Kevin wrote: > > From: Jason Gunthorpe <jgg@xxxxxxxxxx> > > Sent: Tuesday, November 15, 2022 2:44 AM > > > > On Mon, Nov 14, 2022 at 07:28:47AM +0000, Tian, Kevin wrote: > > > > From: Jason Gunthorpe <jgg@xxxxxxxxxx> > > > > Sent: Tuesday, November 8, 2022 8:49 AM > > > > > > > > + > > > > +/* > > > > + * Automatically find a block of IOVA that is not being used and not > > reserved. > > > > + * Does not return a 0 IOVA even if it is valid. > > > > > > what is the problem with 0? should this be documented in uAPI? > > > > 0 is commonly used as an errant value for uninitialized things. We > > don't automatically map it into a process mm because it can cause > > security problems if we don't trap a bogus 0/NULL pointer reference. > > > > The same logic applies here too, the allocator should not return 0 to > > reserve it as an unmapped IOVA page to catch bugs. > > CPU doesn't reference IOVA. Where do such bugs exist? SW is always buggy and SW programs the DMA address, so it could leave a 0 behind or something during the programming. > > I don't think it needs to be documented > > this again causes a subtle difference between automatic allocation > and fixed iova. If we really think address 0 is something related > to bug, then why is it allowed with fixed iova? Because fixed can do anything up to the limits of the HW. It is like mmp, where MAP_FIXED can allocate 0s as well, but automatic allocation will not. > > > > + if (!__alloc_iova_check_used(&allowed_span, length, > > > > + iova_alignment, page_offset)) > > > > + continue; > > > > + > > > > + interval_tree_for_each_span(&area_span, &iopt->area_itree, > > > > + allowed_span.start_used, > > > > + allowed_span.last_used) { > > > > + if (!__alloc_iova_check_hole(&area_span, length, > > > > + iova_alignment, > > > > + page_offset)) > > > > + continue; > > > > + > > > > + interval_tree_for_each_span(&reserved_span, > > > > + &iopt->reserved_itree, > > > > + area_span.start_used, > > > > + area_span.last_used) { > > > > + if (!__alloc_iova_check_hole( > > > > + &reserved_span, length, > > > > + iova_alignment, page_offset)) > > > > + continue; > > > > > > this could be simplified by double span. > > > > It is subtly not compatible, the double span looks for used areas. > > This is looking for a used area in the allowed_itree, a hole in the > > area_itree, and a hole in the reserved_itree. > > the inner two loops can be replaced by double span, since both > are skipping used areas. The 2nd loop is looking for a used on allowed and the 3rd loop is looking for a hole in reserved. To fix it we'd have to invert allowed to work like reserved - which complicates the uAPI code. > > > > + if (iopt->disable_large_pages) > > > > + new_iova_alignment = PAGE_SIZE; > > > > + else > > > > + new_iova_alignment = 1; > > > > > > I didn't understand why we start searching alignment from a > > > smaller value when large pages is enabled. what is the > > > connection here? > > > > 'disable_large_pages' is a tiny bit misnamed, what it really does is > > ensure that every iommu_map call is exactly PAGE_SIZE, not more (large > > pages) and not less (what this is protecting against). > > > > So if a domain has less than PAGE_SIZE we upgrade to > > PAGE_SIZE. Otherwise we allow using the lowest possible alignment. > > > > This allows userspace to always work in PAGE_SIZE units without fear > > of problems, eg with sub-page-size units becoming weird or something. > > above are good stuff in a comment. This is the comment: /* * This is part of the VFIO compatibility support for VFIO_TYPE1_IOMMU. That * mode permits splitting a mapped area up, and then one of the splits is * unmapped. Doing this normally would cause us to violate our invariant of * pairing map/unmap. Thus, to support old VFIO compatibility disable support * for batching consecutive PFNs. All PFNs mapped into the iommu are done in * PAGE_SIZE units, not larger or smaller. */ static int batch_iommu_map_small(struct iommu_domain *domain, unsigned long iova, phys_addr_t paddr, size_t size, int prot) Thanks, Jason