Re: [PATCH v1 2/2] mm/hugetlb: fix hugetlb vs. core-mm PT locking

Baolin Wang <baolin.wang@xxxxxxxxxxxxxxxxx> · Fri, 26 Jul 2024 17:38:47 +0800

On 2024/7/26 16:04, David Hildenbrand wrote:
On 26.07.24 04:33, Baolin Wang wrote:

On 2024/7/26 02:39, David Hildenbrand wrote:
We recently made GUP's common page table walking code to also walk
hugetlb VMAs without most hugetlb special-casing, preparing for the
future of having less hugetlb-specific page table walking code in the
codebase. Turns out that we missed one page table locking detail: page
table locking for hugetlb folios that are not mapped using a single
PMD/PUD.

Assume we have hugetlb folio that spans multiple PTEs (e.g., 64 KiB
hugetlb folios on arm64 with 4 KiB base page size). GUP, as it walks the
page tables, will perform a pte_offset_map_lock() to grab the PTE table
lock.

However, hugetlb that concurrently modifies these page tables would
actually grab the mm->page_table_lock: with USE_SPLIT_PTE_PTLOCKS, the
locks would differ. Something similar can happen right now with hugetlb
folios that span multiple PMDs when USE_SPLIT_PMD_PTLOCKS.

Let's make huge_pte_lockptr() effectively uses the same PT locks as any
core-mm page table walker would.

Thanks for raising the issue again. I remember fixing this issue 2 years
ago in commit fac35ba763ed ("mm/hugetlb: fix races when looking up a
CONT-PTE/PMD size hugetlb page"), but it seems to be broken again.

Ah, right! We fixed it by rerouting to hugetlb code that we then removed :D

Did we have a reproducer back then that would make my live easier?

I don't have any reproducers right now. I remember I added some ugly 
hack code (adding delay() etc.) in kernel to analyze this issue, and not 
easy to reproduce. :(