Re: [RFC PATCH 0/4] KVM: x86/mmu: Rework marking folios dirty/accessed

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Apr 04, 2024, David Hildenbrand wrote:
> On 04.04.24 19:31, Sean Christopherson wrote:
> > On Thu, Apr 04, 2024, David Hildenbrand wrote:
> > > On 04.04.24 00:19, Sean Christopherson wrote:
> > > > Hmm, we essentially already have an mmu_notifier today, since secondary MMUs need
> > > > to be invalidated before consuming dirty status.  Isn't the end result essentially
> > > > a sane FOLL_TOUCH?
> > > 
> > > Likely. As stated in my first mail, FOLL_TOUCH is a bit of a mess right now.
> > > 
> > > Having something that makes sure the writable PTE/PMD is dirty (or
> > > alternatively sets it dirty), paired with MMU notifiers notifying on any
> > > mkclean would be one option that would leave handling how to handle dirtying
> > > of folios completely to the core. It would behave just like a CPU writing to
> > > the page table, which would set the pte dirty.
> > > 
> > > Of course, if frequent clearing of the dirty PTE/PMD bit would be a problem
> > > (like we discussed for the accessed bit), that would not be an option. But
> > > from what I recall, only clearing the PTE/PMD dirty bit is rather rare.
> > 
> > And AFAICT, all cases already invalidate secondary MMUs anyways, so if anything
> > it would probably be a net positive, e.g. the notification could more precisely
> > say that SPTEs need to be read-only, not blasted away completely.
> 
> As discussed, I think at least madvise_free_pte_range() wouldn't do that.

I'm getting a bit turned around.  Are you talking about what madvise_free_pte_range()
would do in this future world, or what madvise_free_pte_range() does today?  Because
today, unless I'm really misreading the code, secondary MMUs are invalidated before
the dirty bit is cleared.

	mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, mm,
				range.start, range.end);

	lru_add_drain();
	tlb_gather_mmu(&tlb, mm);
	update_hiwater_rss(mm);

	mmu_notifier_invalidate_range_start(&range);
	tlb_start_vma(&tlb, vma);
	walk_page_range(vma->vm_mm, range.start, range.end,
			&madvise_free_walk_ops, &tlb);
	tlb_end_vma(&tlb, vma);
	mmu_notifier_invalidate_range_end(&range);

KVM (or any other secondary MMU) can re-establish mapping with W=1,D=0 in the
PTE, but the costly invalidation (zap+flush+fault) still happens.

> Notifiers would only get called later when actually zapping the folio.

And in case we're talking about a hypothetical future, I was thinking the above
could do MMU_NOTIFY_WRITE_PROTECT instead of MMU_NOTIFY_CLEAR.

> So at least for some time, you would have the PTE not dirty, but the SPTE
> writable or even dirty. So you'd have to set the page dirty when zapping the
> SPTE ...  and IMHO that is what we should maybe try to avoid :)




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux