On 2024/7/26 16:04, David Hildenbrand wrote:
On 26.07.24 04:33, Baolin Wang wrote:
On 2024/7/26 02:39, David Hildenbrand wrote:
We recently made GUP's common page table walking code to also walk
hugetlb VMAs without most hugetlb special-casing, preparing for the
future of having less hugetlb-specific page table walking code in the
codebase. Turns out that we missed one page table locking detail: page
table locking for hugetlb folios that are not mapped using a single
PMD/PUD.
Assume we have hugetlb folio that spans multiple PTEs (e.g., 64 KiB
hugetlb folios on arm64 with 4 KiB base page size). GUP, as it walks the
page tables, will perform a pte_offset_map_lock() to grab the PTE table
lock.
However, hugetlb that concurrently modifies these page tables would
actually grab the mm->page_table_lock: with USE_SPLIT_PTE_PTLOCKS, the
locks would differ. Something similar can happen right now with hugetlb
folios that span multiple PMDs when USE_SPLIT_PMD_PTLOCKS.
Let's make huge_pte_lockptr() effectively uses the same PT locks as any
core-mm page table walker would.
Thanks for raising the issue again. I remember fixing this issue 2 years
ago in commit fac35ba763ed ("mm/hugetlb: fix races when looking up a
CONT-PTE/PMD size hugetlb page"), but it seems to be broken again.
Ah, right! We fixed it by rerouting to hugetlb code that we then removed :D
Did we have a reproducer back then that would make my live easier?
I don't have any reproducers right now. I remember I added some ugly
hack code (adding delay() etc.) in kernel to analyze this issue, and not
easy to reproduce. :(