Re: [PATCH v3 0/3] Encapsulate PTE contents from non-arch code

Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> · Mon, 12 Jun 2023 13:16:56 -0700

On Mon, 12 Jun 2023 16:15:42 +0100 Ryan Roberts <ryan.roberts@xxxxxxx> wrote:

> Hi All,
> 
> (Including wider audience this time since changes touch a fair few subsystems)
> 
> This is the second half of v3 of a series to improve the encapsulation of pte
> entries by disallowing non-arch code from directly dereferencing pte_t pointers.

That's basically all we have here for [0/N] cover letter content.  I
stole some words from the [3/3] changelog, so we now have:

: A series to improve the encapsulation of pte entries by disallowing
: non-arch code from directly dereferencing pte_t pointers.
: 
: This means that by default, the accesses change from a C dereference to a
: READ_ONCE().  This is technically the correct thing to do since where
: pgtables are modified by HW (for access/dirty) they are volatile and
: therefore we should always ensure READ_ONCE() semantics.
: 
: But more importantly, by always using the helper, it can be overridden by
: the architecture to fully encapsulate the contents of the pte.  Arch code
: is deliberately not converted, as the arch code knows best.  It is
: intended that arch code (arm64) will override the default with its own
: implementation that can (e.g.) hide certain bits from the core code, or
: determine young/dirty status by mixing in state from another source.

> Based on earlier feedback, I split the series in 2; the first part, fixes for
> existing bugs, was already posted at [3] and merged into mm-stable. This second
> part contains the conversion from direct dereferences to instead use
> ptep_get()/ptep_get_lockless().
> 
> See the v1 cover letter at [1] for rationale for this work.
> 
> Based on feedback at v2, I've removed the new ptep_deref() helper I originally
> added, and am now using the existing ptep_get() and ptep_get_lockless() helpers.
> Testing on Ampere Altra (arm64) showed no difference in performance when using
> ptep_deref() (*pte) vs ptep_get() (READ_ONCE(*pte)).
> 
> Patches are based on mm-unstable (49e038b1919e) and a branch is available at [4]
> (Let me know if this is the wrong branch to target - I'm still not familiar with
> the details of the mm- dev process!). Note that Hugh Dickins's "mm: allow
> pte_offset_map[_lock]() to fail" (now in mm-unstable) patch set caused a number
> of conflicts which I've resolved. But due to that, you won't be able to apply
> these patches on top of Linus's tree. I have an alternate branch on top of
> v6.4-rc6 at [5].

Yep, that's all great, thanks.

Is there some clever trick we can do to prevent new open-coded derefs
of pte_t* from being introduced?

I suppose we could convert pte_t to a single-member struct to force a
compile error.  That struct will get passed by value to ptep_get() so
that's OK.  But this isn't viable unless/until all architectures are
converted :(

Or we rely upon Ryan to grep the tree occasionally ;)