On Tue, Jan 17, 2023 at 12:28 PM Jann Horn <jannh@xxxxxxxxxx> wrote: > > On Tue, Jan 17, 2023 at 4:25 PM Michal Hocko <mhocko@xxxxxxxx> wrote: > > On Mon 09-01-23 12:53:13, Suren Baghdasaryan wrote: > > > Protect VMA from concurrent page fault handler while collapsing a huge > > > page. Page fault handler needs a stable PMD to use PTL and relies on > > > per-VMA lock to prevent concurrent PMD changes. pmdp_collapse_flush(), > > > set_huge_pmd() and collapse_and_free_pmd() can modify a PMD, which will > > > not be detected by a page fault handler without proper locking. > > > > I am struggling with this changelog. Maybe because my recollection of > > the THP collapsing subtleties is weak. But aren't you just trying to say > > that the current #PF handling and THP collapsing need to be mutually > > exclusive currently so in order to keep that assumption you have mark > > the vma write locked? > > > > Also it is not really clear to me how that handles other vmas which can > > share the same thp? > > It's not about the hugepage itself, it's about how the THP collapse > operation frees page tables. > > Before this series, page tables can be walked under any one of the > mmap lock, the mapping lock, and the anon_vma lock; so when khugepaged > unlinks and frees page tables, it must ensure that all of those either > are locked or don't exist. This series adds a fourth lock under which > page tables can be traversed, and so khugepaged must also lock out that one. > > There is a codepath in khugepaged that iterates through all mappings > of a file to zap page tables (retract_page_tables()), which locks each > visited mm with mmap_write_trylock() and now also does > vma_write_lock(). > > > I think one aspect of this patch that might cause trouble later on, if > support for non-anonymous VMAs is added, is that retract_page_tables() > now does vma_write_lock() while holding the mapping lock; the page > fault handling path would probably take the locks the other way > around, leading to a deadlock? So the vma_write_lock() in > retract_page_tables() might have to become a trylock later on. > > Related: Please add the new VMA lock to the big lock ordering comments > at the top of mm/rmap.c. (And maybe later mm/filemap.c, if/when you > add file VMA support.) Thanks for the clarifications and the warning. I'll add appropriate comments and will take this deadlocking scenario into account when later implementing support for file-backed page faults.