On 2021/8/31 PM6:20, Vlastimil Babka wrote:
On 8/28/21 06:23, Qi Zheng wrote:
The smp_wmb() which is in the __pte_alloc() is used to
ensure all ptes setup is visible before the pte is made
visible to other CPUs by being put into page tables. We
only need this when the pte is actually populated, so
move it to pte_install(). __pte_alloc_kernel(),
It's named pmd_install()?
Yes, I will update it in the next version.
__p4d_alloc(), __pud_alloc() and __pmd_alloc() are similar
to this case.
We can also defer smp_wmb() to the place where the pmd entry
is really populated by preallocated pte. There are two kinds
of user of preallocated pte, one is filemap & finish_fault(),
another is THP. The former does not need another smp_wmb()
because the smp_wmb() has been done by pte_install().
Same here.
Fortunately, the latter also does not need another smp_wmb()
because there is already a smp_wmb() before populating the
new pte when the THP uses a preallocated pte to split a huge
pmd.
Signed-off-by: Qi Zheng <zhengqi.arch@xxxxxxxxxxxxx>
Reviewed-by: Muchun Song <songmuchun@xxxxxxxxxxxxx>
---
mm/memory.c | 47 ++++++++++++++++++++---------------------------
mm/sparse-vmemmap.c | 2 +-
2 files changed, 21 insertions(+), 28 deletions(-)
diff --git a/mm/memory.c b/mm/memory.c
index ef7b1762e996..9c7534187454 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -439,6 +439,20 @@ void pmd_install(struct mm_struct *mm, pmd_t *pmd, pgtable_t *pte)
if (likely(pmd_none(*pmd))) { /* Has another populated it ? */
mm_inc_nr_ptes(mm);
+ /*
+ * Ensure all pte setup (eg. pte page lock and page clearing) are
+ * visible before the pte is made visible to other CPUs by being
+ * put into page tables.
+ *
+ * The other side of the story is the pointer chasing in the page
+ * table walking code (when walking the page table without locking;
+ * ie. most of the time). Fortunately, these data accesses consist
+ * of a chain of data-dependent loads, meaning most CPUs (alpha
+ * being the notable exception) will already guarantee loads are
+ * seen in-order. See the alpha page table accessors for the
+ * smp_rmb() barriers in page table walking code.
+ */
+ smp_wmb(); /* Could be smp_wmb__xxx(before|after)_spin_lock */
So, could it? :)
Yes, it could, but we don't have smp_wmb__after_spin_lock() now.