On 05/03/25 4:16 pm, David Hildenbrand wrote:
On 05.03.25 11:21, Dev Jain wrote:
In __handle_mm_fault(),
1. Why is there a barrier() for the PUD logic?
Good question. It was added in
commit a00cc7d9dd93d66a3fb83fc52aa57a4bec51c517
Author: Matthew Wilcox <willy@xxxxxxxxxxxxx>
Date: Fri Feb 24 14:57:02 2017 -0800
mm, x86: add support for PUD-sized transparent hugepages
Maybe it was an alternative to performing a READ_ONCE(*vmf.pud).
Maybe it was done that way, because pudp_get_lockless() does not exist.
And it would likely not be required, because on architectures where
ptep_get_lockless() does some magic (see below, mostly 32bit), PUD THP
are not applicable.
Thanks for your reply David.
2. For the PMD logic, in the if block, we use *vmf.pmd, and in the
else block
we use pmdp_get_lockless(); what if someone changes the pmd just
when we
have begun processing the conditions in the if block, fail in the
if block
and then the else block operates on a different pmd value.
Shouldn't we cache
the value of the pmd and operate on a single consistent value
until we take the
lock and then finally check using pxd_same() and friends?
The pmd_none(*vmf.pmd) is fine. create_huge_pmd() must be able to deal
with races, because we are not holding any locks.
I had a mental hiccup, yes we don't need the cached value even before
the if block, as the relevant path will eventually check after taking
the lock. I was thinking of all sorts of weird races.
We only have to use pmdp_get_lockless() when we want to effectively
perform a READ_ONCE(), and make sure that we read something "reasonable"
that we can operate on, even with concurrent changes. (e.g., not read a
garbage PFN just because of some concurrent changes)
Oh I just looked at the arm64 definition and assumed ptep_get_lockless()
== READ_ONCE() :) Now this makes sense. So I am guessing that we can
still get away with a *vmf.pmd on 64-bit arches?
A separate question: When taking the create_huge_pmd() path with a read
fault and after taking the pmd lock, why shouldn't we check with
pmd_none(pmdp_get_lockless(vmf.pmd)) instead of plain *vmf.pmd...surely
now we must load the actual correct value from memory since we are
committing into mapping the huge zero folio?
And after looking somewhat more, why even is a pmd_none(*pmd) there in
set_huge_zero_folio()...it should be the responsibility of the caller to
verify this? Any caller will just assume that it got the huge zero folio
mapped so this check should be redundant.
We'll store the value in vmf.orig_pmd, on which we'll operate and try to
detect later changes using pmd_same(). So we really want something
consistent in there.
See the description above ptep_get_lockless(), why we cannot simply do a
READ_ONCE on architectures where a PTE cannot be read atomically (e.g.,
8 byte PTEs on 32bit architecture).