Re: [PATCH RFC 01/19] iommu: Add iommu_domain ops for dirty tracking

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 4/29/22 08:54, Tian, Kevin wrote:
>> From: Joao Martins <joao.m.martins@xxxxxxxxxx>
>> Sent: Friday, April 29, 2022 5:09 AM
>>
>> Add to iommu domain operations a set of callbacks to
>> perform dirty tracking, particulary to start and stop
>> tracking and finally to test and clear the dirty data.
> 
> to be consistent with other context, s/test/read/
> 
/me nods

>>
>> Drivers are expected to dynamically change its hw protection
>> domain bits to toggle the tracking and flush some form of
> 
> 'hw protection domain bits' sounds a bit weird. what about
> just using 'translation structures'?
> 
I replace with that instead.

>> control state structure that stands in the IOVA translation
>> path.
>>
>> For reading and clearing dirty data, in all IOMMUs a transition
>> from any of the PTE access bits (Access, Dirty) implies flushing
>> the IOTLB to invalidate any stale data in the IOTLB as to whether
>> or not the IOMMU should update the said PTEs. The iommu core APIs
>> introduce a new structure for storing the dirties, albeit vendor
>> IOMMUs implementing .read_and_clear_dirty() just use
> 
> s/vendor IOMMUs/iommu drivers/
> 
> btw according to past history in iommu mailing list sounds like
> 'vendor' is not a term welcomed in the kernel, while there are
> many occurrences in this series.
> 
Hmm, I wasn't aware actually.

Will move away from using 'vendor'.

> [...]
>> Although, The ARM SMMUv3 case is a tad different that its x86
>> counterparts. Rather than changing *only* the IOMMU domain device entry
>> to
>> enable dirty tracking (and having a dedicated bit for dirtyness in IOPTE)
>> ARM instead uses a dirty-bit modifier which is separately enabled, and
>> changes the *existing* meaning of access bits (for ro/rw), to the point
>> that marking access bit read-only but with dirty-bit-modifier enabled
>> doesn't trigger an perm io page fault.
>>
>> In pratice this means that changing iommu context isn't enough
>> and in fact mostly useless IIUC (and can be always enabled). Dirtying
>> is only really enabled when the DBM pte bit is enabled (with the
>> CD.HD bit as a prereq).
>>
>> To capture this h/w construct an iommu core API is added which enables
>> dirty tracking on an IOVA range rather than a device/context entry.
>> iommufd picks one or the other, and IOMMUFD core will favour
>> device-context op followed by IOVA-range alternative.
> 
> Above doesn't convince me on the necessity of introducing two ops
> here. Even for ARM it can accept a per-domain op and then walk the
> page table to manipulate any modifier for existing mappings. It
> doesn't matter whether it sets one bit in the context entry or multiple
> bits in the page table.
> 
OK

> [...]
>> +
> 
> Miss comment for this function.
> 
ack

>> +unsigned int iommu_dirty_bitmap_record(struct iommu_dirty_bitmap
>> *dirty,
>> +				       unsigned long iova, unsigned long length)
>> +{
>> +	unsigned long nbits, offset, start_offset, idx, size, *kaddr;
>> +
>> +	nbits = max(1UL, length >> dirty->pgshift);
>> +	offset = (iova - dirty->iova) >> dirty->pgshift;
>> +	idx = offset / (PAGE_SIZE * BITS_PER_BYTE);
>> +	offset = offset % (PAGE_SIZE * BITS_PER_BYTE);
>> +	start_offset = dirty->start_offset;
> 
> could you elaborate the purpose of dirty->start_offset? Why dirty->iova
> doesn't start at offset 0 of the bitmap?
> 

It is to deal with page-unaligned addresses.

Like if the start of the bitmap -- and hence bitmap base IOVA for the first bit of the
bitmap -- isn't page-aligned and starts in the offset of a given page. Thus start-offset
is to know bit in the pinned page does dirty::iova correspond to.

>> +
>> +	while (nbits > 0) {
>> +		kaddr = kmap(dirty->pages[idx]) + start_offset;
>> +		size = min(PAGE_SIZE * BITS_PER_BYTE - offset, nbits);
>> +		bitmap_set(kaddr, offset, size);
>> +		kunmap(dirty->pages[idx]);
> 
> what about the overhead of kmap/kunmap when it's done for every
> dirtied page (as done in patch 18)?

Isn't it an overhead mainly with highmem? Otherwise it ends up being page_to_virt(...)

But anyways the kmap's should be cached, and teardown when pinning the next user data.

Performance analysis is also something I want to fully hash out (as mentioned in the cover
letter).



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux