Re: [PATCH v3] mm/hugetlb: fix hugetlb vs. core-mm PT locking

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Peter,

-	if (huge_page_size(h) == PMD_SIZE)
+	unsigned long size = huge_page_size(h);
+
+	VM_WARN_ON(size == PAGE_SIZE);
+
+	/*
+	 * hugetlb must use the exact same PT locks as core-mm page table
+	 * walkers would. When modifying a PTE table, hugetlb must take the
+	 * PTE PT lock, when modifying a PMD table, hugetlb must take the PMD
+	 * PT lock etc.
+	 *
+	 * The expectation is that any hugetlb folio smaller than a PMD is
+	 * always mapped into a single PTE table and that any hugetlb folio
+	 * smaller than a PUD (but at least as big as a PMD) is always mapped
+	 * into a single PMD table.
+	 *
+	 * If that does not hold for an architecture, then that architecture
+	 * must disable split PT locks such that all *_lockptr() functions
+	 * will give us the same result: the per-MM PT lock.
+	 *
+	 * Note that with e.g., CONFIG_PGTABLE_LEVELS=2 where
+	 * PGDIR_SIZE==P4D_SIZE==PUD_SIZE==PMD_SIZE, we'd use the MM PT lock
+	 * directly with a PMD hugetlb size, whereby core-mm would call
+	 * pmd_lockptr() instead. However, in such configurations split PMD
+	 * locks are disabled -- split locks don't make sense on a single
+	 * PGDIR page table -- and the end result is the same.
+	 */
+	if (size >= P4D_SIZE)
+		return &mm->page_table_lock;

I'd drop this so the mm lock fallback will be done below (especially in
reality the pud lock is always mm lock for now..).  Also this line reads
like there can be P4D size huge page but in reality PUD is the largest
(nopxx doesn't count).  We also same some cycles in most cases if removed.

The compiler should be smart enough to optimize most of that out. But
I agree that there is no need to be that future-proof here.

These two are interesting:

arch/powerpc/mm/hugetlbpage.c:  if (!mm_pud_folded(mm) && sz >= P4D_SIZE)
arch/riscv/mm/hugetlbpage.c:    else if (sz >= P4D_SIZE)

But I assume they are only fordward-looking. (GUP would already be
pretty broken if that would exist)


+	else if (size >= PUD_SIZE)
+		return pud_lockptr(mm, (pud_t *) pte);
+	else if (size >= PMD_SIZE || IS_ENABLED(CONFIG_HIGHPTE))

I thought this HIGHPTE can also be dropped? Because in HIGHPTE it should
never have lower-than-PMD huge pages or we're in trouble.  That's why I
kept one WARN_ON() in my pesudo code but only before trying to take the pte
lockptr.

Then the compiler won't optimize out the ptep_lockptr() call and we'll run
into a build error. And I think the HIGHPTE builderror serves good purpose.

In file included from <command-line>:
In function 'ptep_lockptr',
    inlined from 'huge_pte_lockptr' at ./include/linux/hugetlb.h:974:9,
    inlined from 'huge_pte_lock' at ./include/linux/hugetlb.h:1248:8,
    inlined from 'pagemap_scan_hugetlb_entry' at fs/proc/task_mmu.c:2581:8:
././include/linux/compiler_types.h:510:45: error: call to '__compiletime_assert_256' declared with attribute error: BUILD_BUG_ON failed: IS_ENABLED(CONFIG_HIGHPTE)
  510 |         _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
      |                                             ^
././include/linux/compiler_types.h:491:25: note: in definition of macro '__compiletime_assert'
  491 |                         prefix ## suffix();                             \
      |                         ^~~~~~
././include/linux/compiler_types.h:510:9: note: in expansion of macro '_compiletime_assert'
  510 |         _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
      |         ^~~~~~~~~~~~~~~~~~~
./include/linux/build_bug.h:39:37: note: in expansion of macro 'compiletime_assert'
   39 | #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
      |                                     ^~~~~~~~~~~~~~~~~~
./include/linux/build_bug.h:50:9: note: in expansion of macro 'BUILD_BUG_ON_MSG'
   50 |         BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition)
      |         ^~~~~~~~~~~~~~~~
./include/linux/mm.h:2874:9: note: in expansion of macro 'BUILD_BUG_ON'
 2874 |         BUILD_BUG_ON(IS_ENABLED(CONFIG_HIGHPTE));



It would be even better to have a way to know whether an arch even has
hugetlb < PMD_SIZE (config option?) and use that instead here; but I'll leave
that as an optimization.


--
Cheers,

David / dhildenb





[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux