On 2/14/22 14:06, Jason Gunthorpe wrote: > On Mon, Feb 14, 2022 at 01:34:15PM +0000, Joao Martins wrote: > >> [*] apparently we need to write an invalid entry first, invalidate the {IO}TLB >> and then write the new valid entry. Not sure I understood correctly that this >> is the 'break-before-make' thingie. > > Doesn't that explode if the invalid entry is DMA'd to? > Yes, IIUC. Also, the manual has this note: "Note: For example, to split a block into constituent granules (or to merge a span of granules into an equivalent block), VMSA requires the region to be made invalid, a TLB invalidate performed, then to make the region take the new configuration. Note: The requirement for a break-before-make sequence can cause problems for unrelated I/O streams that might use addresses overlapping a region of interest, because the I/O streams cannot always be conveniently stopped and might not tolerate translation faults. It is advantageous to perform live update of a block into smaller translations, or a set of translations into a larger block size." Probably why the original SMMUv3.2 dirty track series requires FEAT_BBM as it had to do in-place atomic updates to split/collapse IO pgtables. Not enterily clear if HTTU Dirty access requires the same. >>>> I wonder if we could start progressing the dirty tracking as a first initial series and >>>> then have the split + collapse handling as a second part? That would be quite >>>> nice to get me going! :D >>> >>> I think so, and I think we should. It is such a big problem space, it >>> needs to get broken up. >> >> OK, cool! I'll stick with the same (slimmed down) IOMMU+VFIO interface as proposed in the >> past except with the x86 support only[*]. And we poke holes there I guess. >> >> [*] I might include Intel too, albeit emulated only. > > Like I said, I'd prefer we not build more on the VFIO type 1 code > until we have a conclusion for iommufd.. > I didn't quite understand what you mean by conclusion. If by conclusion you mean the whole thing to be merged, how can the work be broken up to pieces if we busy-waiting on the new subsystem? Or maybe you meant in terms of direction... I can build on top of iommufd -- Just trying to understand how this is going to work out. > While returning the dirty data looks straight forward, it is hard to > see an obvious path to enabling and controlling the system iommu the > way vfio is now. It seems strange to have a whole UAPI for userspace [*] meant to return dirty data to userspace, when dirty right now means the whole pinned page set and so copying the whole guest ... and the guest is running so we might be racing with the device changing guest pages with the VMM/CPU unaware of it. Even with no dwelling of IOMMU pagetables (i.e. split/collapse IO base pages) it would still help greatly the current status quo of copying the entire thing :( Hence my thinking was that the patches /if small/ would let us see how dirty tracking might work for iommu kAPI (and iommufd) too. Would it be better to do more iterative steps (when possible) as opposed to scratch and rebuild VFIO type1 IOMMU handling? Joao [*] VFIO_IOMMU_DIRTY_PAGES{_FLAG_START,_FLAG_STOP,_FLAG_GET_BITMAP}