On 4/29/22 08:54, Tian, Kevin wrote: >> From: Joao Martins <joao.m.martins@xxxxxxxxxx> >> Sent: Friday, April 29, 2022 5:09 AM >> >> Add to iommu domain operations a set of callbacks to >> perform dirty tracking, particulary to start and stop >> tracking and finally to test and clear the dirty data. > > to be consistent with other context, s/test/read/ > /me nods >> >> Drivers are expected to dynamically change its hw protection >> domain bits to toggle the tracking and flush some form of > > 'hw protection domain bits' sounds a bit weird. what about > just using 'translation structures'? > I replace with that instead. >> control state structure that stands in the IOVA translation >> path. >> >> For reading and clearing dirty data, in all IOMMUs a transition >> from any of the PTE access bits (Access, Dirty) implies flushing >> the IOTLB to invalidate any stale data in the IOTLB as to whether >> or not the IOMMU should update the said PTEs. The iommu core APIs >> introduce a new structure for storing the dirties, albeit vendor >> IOMMUs implementing .read_and_clear_dirty() just use > > s/vendor IOMMUs/iommu drivers/ > > btw according to past history in iommu mailing list sounds like > 'vendor' is not a term welcomed in the kernel, while there are > many occurrences in this series. > Hmm, I wasn't aware actually. Will move away from using 'vendor'. > [...] >> Although, The ARM SMMUv3 case is a tad different that its x86 >> counterparts. Rather than changing *only* the IOMMU domain device entry >> to >> enable dirty tracking (and having a dedicated bit for dirtyness in IOPTE) >> ARM instead uses a dirty-bit modifier which is separately enabled, and >> changes the *existing* meaning of access bits (for ro/rw), to the point >> that marking access bit read-only but with dirty-bit-modifier enabled >> doesn't trigger an perm io page fault. >> >> In pratice this means that changing iommu context isn't enough >> and in fact mostly useless IIUC (and can be always enabled). Dirtying >> is only really enabled when the DBM pte bit is enabled (with the >> CD.HD bit as a prereq). >> >> To capture this h/w construct an iommu core API is added which enables >> dirty tracking on an IOVA range rather than a device/context entry. >> iommufd picks one or the other, and IOMMUFD core will favour >> device-context op followed by IOVA-range alternative. > > Above doesn't convince me on the necessity of introducing two ops > here. Even for ARM it can accept a per-domain op and then walk the > page table to manipulate any modifier for existing mappings. It > doesn't matter whether it sets one bit in the context entry or multiple > bits in the page table. > OK > [...] >> + > > Miss comment for this function. > ack >> +unsigned int iommu_dirty_bitmap_record(struct iommu_dirty_bitmap >> *dirty, >> + unsigned long iova, unsigned long length) >> +{ >> + unsigned long nbits, offset, start_offset, idx, size, *kaddr; >> + >> + nbits = max(1UL, length >> dirty->pgshift); >> + offset = (iova - dirty->iova) >> dirty->pgshift; >> + idx = offset / (PAGE_SIZE * BITS_PER_BYTE); >> + offset = offset % (PAGE_SIZE * BITS_PER_BYTE); >> + start_offset = dirty->start_offset; > > could you elaborate the purpose of dirty->start_offset? Why dirty->iova > doesn't start at offset 0 of the bitmap? > It is to deal with page-unaligned addresses. Like if the start of the bitmap -- and hence bitmap base IOVA for the first bit of the bitmap -- isn't page-aligned and starts in the offset of a given page. Thus start-offset is to know bit in the pinned page does dirty::iova correspond to. >> + >> + while (nbits > 0) { >> + kaddr = kmap(dirty->pages[idx]) + start_offset; >> + size = min(PAGE_SIZE * BITS_PER_BYTE - offset, nbits); >> + bitmap_set(kaddr, offset, size); >> + kunmap(dirty->pages[idx]); > > what about the overhead of kmap/kunmap when it's done for every > dirtied page (as done in patch 18)? Isn't it an overhead mainly with highmem? Otherwise it ends up being page_to_virt(...) But anyways the kmap's should be cached, and teardown when pinning the next user data. Performance analysis is also something I want to fully hash out (as mentioned in the cover letter).