On Fri, Jun 11, 2021 at 04:42:49PM -0300, Jason Gunthorpe wrote: > On Fri, Jun 11, 2021 at 12:05:42PM -0700, Hugh Dickins wrote: > > > diff --git a/arch/x86/include/asm/pgtable-3level.h b/arch/x86/include/asm/pgtable-3level.h > > > index e896ebef8c24cb..0bf1fdec928e71 100644 > > > +++ b/arch/x86/include/asm/pgtable-3level.h > > > @@ -75,7 +75,7 @@ static inline void native_set_pte(pte_t *ptep, pte_t pte) > > > static inline pmd_t pmd_read_atomic(pmd_t *pmdp) > > > { > > > pmdval_t ret; > > > - u32 *tmp = (u32 *)pmdp; > > > + u32 *tmp = READ_ONCE((u32 *)pmdp); > > > > > > ret = (pmdval_t) (*tmp); > > > if (ret) { > > > @@ -84,7 +84,7 @@ static inline pmd_t pmd_read_atomic(pmd_t *pmdp) > > > * or we can end up with a partial pmd. > > > */ > > > smp_rmb(); > > > - ret |= ((pmdval_t)*(tmp + 1)) << 32; > > > + ret |= READ_ONCE((pmdval_t)*(tmp + 1)) << 32; > > > } > > > > Maybe that. Or maybe now (since Will's changes) it can just do > > one READ_ONCE() of the whole, then adjust its local copy. > > I think the smb_rmb() is critical here to ensure a PTE table pointer > is coherent, READ_ONCE is not a substitute, unless I am miss > understanding what Will's changes are??? Yes, I agree that the barrier is needed here for x86 PAE. I would really have liked to enforce native-sized access in READ_ONCE(), but unfortunately there is plenty of code out there which is resilient to a 64-bit access being split into two separate 32-bit accesses and so I wasn't able to go that far. That being said, pmd_read_atomic() probably _should_ be using READ_ONCE() because using it inconsistently can give rise to broken codegen, e.g. if you do: pmdval_t x, y, z; x = *pmdp; // Invalid y = READ_ONCE(*pmdp); // Valid if (pmd_valid(y)) z = *pmdp; // Invalid again! Then the compiler can allocate the same register for x and z, but will issue an additional load for y. If a concurrent update takes place to the pmd which transitions from Invalid -> Valid, then it will look as though things went back in time, because z will be stale. We actually hit this on arm64 in practice [1]. Will [1] https://lore.kernel.org/lkml/20171003114244.430374928@xxxxxxxxxxxxxxxxxxx/