On 04.04.24 00:19, Sean Christopherson wrote:
On Wed, Apr 03, 2024, David Hildenbrand wrote:
On 03.04.24 02:17, Sean Christopherson wrote:
On Tue, Apr 02, 2024, David Hildenbrand wrote:
Aha! But try_to_unmap_one() also checks that refcount==mapcount+1, i.e. will
also keep the folio if it has been GUP'd. And __remove_mapping() explicitly states
that it needs to play nice with a GUP'd page being marked dirty before the
reference is dropped.
* Must be careful with the order of the tests. When someone has
* a ref to the folio, it may be possible that they dirty it then
* drop the reference. So if the dirty flag is tested before the
* refcount here, then the following race may occur:
So while it's totally possible for KVM to get a W=1,D=0 PTE, if I'm reading the
code correctly it's safe/legal so long as KVM either (a) marks the folio dirty
while holding a reference or (b) marks the folio dirty before returning from its
mmu_notifier_invalidate_range_start() hook, *AND* obviously if KVM drops its
mappings in response to mmu_notifier_invalidate_range_start().
Yes, I agree that it should work in the context of vmscan. But (b) is
certainly a bit harder to swallow than "ordinary" (a) :)
Heh, all the more reason to switch KVM x86 from (b) => (a).
:)
As raised, if having a writable SPTE would imply having a writable+dirty
PTE, then KVM MMU code wouldn't have to worry about syncing any dirty bits
ever back to core-mm, so patch #2 would not be required. ... well, it would
be replaces by an MMU notifier that notifies about clearing the PTE dirty
bit :)
Hmm, we essentially already have an mmu_notifier today, since secondary MMUs need
to be invalidated before consuming dirty status. Isn't the end result essentially
a sane FOLL_TOUCH?
Likely. As stated in my first mail, FOLL_TOUCH is a bit of a mess right now.
Having something that makes sure the writable PTE/PMD is dirty (or
alternatively sets it dirty), paired with MMU notifiers notifying on any
mkclean would be one option that would leave handling how to handle
dirtying of folios completely to the core. It would behave just like a
CPU writing to the page table, which would set the pte dirty.
Of course, if frequent clearing of the dirty PTE/PMD bit would be a
problem (like we discussed for the accessed bit), that would not be an
option. But from what I recall, only clearing the PTE/PMD dirty bit is
rather rare.
I think this GUP / GUP-user setting of folio dirty bits is a bit of a
mess, but it seems to be working so far, so ... who am I to judge :)
--
Cheers,
David / dhildenb