Re: [RFC PATCH v1 3/4] mm/memory: Use ptep_get_lockless_norecency() for orig_pte

David Hildenbrand <david@xxxxxxxxxx> · Wed, 27 Mar 2024 18:05:52 +0100

On 27.03.24 10:51, Ryan Roberts wrote:
On 26/03/2024 17:58, David Hildenbrand wrote:

vmf->orig_pte = ptep_get_lockless_norecency(vmf->pte)
/* not dirty */

/* Now, thread 2 ends up setting the PTE dirty under PT lock. */

Ahh, this comment about thread 2 is not referring to the code immediately below
it. It all makes much more sense now. :)

Sorry :)

spin_lock(vmf->ptl);
entry = vmf->orig_pte;
if (unlikely(!pte_same(ptep_get(vmf->pte), entry))) {
       ...
}
...
entry = pte_mkyoung(entry);

Do you mean pte_mkdirty() here? You're talking about dirty everywhere else.

No, that is just thread 1 seeing "oh, nothing to do" and then goes ahead and
unconditionally does that in handle_pte_fault().

if (ptep_set_access_flags(vmf->vma, ...)
...
pte_unmap_unlock(vmf->pte, vmf->ptl);

Generic ptep_set_access_flags() will do another pte_same() check and realize
"hey, there was a change!" let's update the PTE!

set_pte_at(vma->vm_mm, address, ptep, entry);

This is called from the generic ptep_set_access_flags() in your example, right?

Yes.

would overwrite the dirty bit set by thread 2.

I'm not really sure what you are getting at... Is your concern that there is a
race where the page could become dirty in the meantime and it now gets lost? I
think that's why arm64 overrides ptep_set_access_flags(); since the hw can
update access/dirty we have to deal with the races.

My concern is that your patch can in subtle ways lead to use losing PTE dirty
bits on architectures that don't have the HW-managed dirty bit. They do exist ;)

But I think the example you give can already happen today? Thread 1 reads
orig_pte = ptep_get_lockless(). So that's already racy, if thread 2 is going to
set dirty just after the get, then thread 1 is going to set the PTE back to (a
modified version of) orig_pte. Isn't it already broken?

No, because the pte_same() check under PTL would have detected it, and we would
have backed out. And I think the problem comes to live when we convert
pte_same()->pte_same_norecency(), because we fail to protect PTE access/dirty
changes that happend under PTL from another thread.

Ahh yep. Got it. I absolutely knew that you would be correct, but I still walked
right into it!

I think one could argue that the generic ptep_set_access_flags() is not
implementing its own spec:

"
... Only sets the access flags (dirty, accessed), as well as write permission.
Furthermore, we know it always gets set to a "more permissive" setting ...
"

Surely it should be folding the access and dirty bits from *ptep into entry if
they are set?

Likely yes. Unless it's also used to clear access/dirty (don't think so, 
and would not be documented).

But the simplification made sense for now, because you previously 
checked that pte_same(), and nobody can modify it concurrently.

Regardless, I think this example proves that its fragile and subtle. I'm not
really sure how to fix it more generally/robustly. Any thoughts? If not perhaps
we are better off keeping ptep_get_lockless() around and only using
ptep_get_lockless_norecency() for the really obviously correct cases?

Maybe one of the "sources of problems" is that we have a 
ptep_get_lockless_norecency() call followed by a pte_same() check, like 
done here.

Not the source of all problems I believe, though ...

--
Cheers,

David / dhildenb