On Tue 17-01-23 21:28:06, Jann Horn wrote: > On Tue, Jan 17, 2023 at 4:25 PM Michal Hocko <mhocko@xxxxxxxx> wrote: > > On Mon 09-01-23 12:53:13, Suren Baghdasaryan wrote: > > > Protect VMA from concurrent page fault handler while collapsing a huge > > > page. Page fault handler needs a stable PMD to use PTL and relies on > > > per-VMA lock to prevent concurrent PMD changes. pmdp_collapse_flush(), > > > set_huge_pmd() and collapse_and_free_pmd() can modify a PMD, which will > > > not be detected by a page fault handler without proper locking. > > > > I am struggling with this changelog. Maybe because my recollection of > > the THP collapsing subtleties is weak. But aren't you just trying to say > > that the current #PF handling and THP collapsing need to be mutually > > exclusive currently so in order to keep that assumption you have mark > > the vma write locked? > > > > Also it is not really clear to me how that handles other vmas which can > > share the same thp? > > It's not about the hugepage itself, it's about how the THP collapse > operation frees page tables. > > Before this series, page tables can be walked under any one of the > mmap lock, the mapping lock, and the anon_vma lock; so when khugepaged > unlinks and frees page tables, it must ensure that all of those either > are locked or don't exist. This series adds a fourth lock under which > page tables can be traversed, and so khugepaged must also lock out that one. > > There is a codepath in khugepaged that iterates through all mappings > of a file to zap page tables (retract_page_tables()), which locks each > visited mm with mmap_write_trylock() and now also does > vma_write_lock(). OK, I see. This would be a great addendum to the changelog. > I think one aspect of this patch that might cause trouble later on, if > support for non-anonymous VMAs is added, is that retract_page_tables() > now does vma_write_lock() while holding the mapping lock; the page > fault handling path would probably take the locks the other way > around, leading to a deadlock? So the vma_write_lock() in > retract_page_tables() might have to become a trylock later on. This, right? #PF retract_page_tables vma_read_lock i_mmap_lock_write i_mmap_lock_read vma_write_lock I might be missing something but I have only found huge_pmd_share to be called from the #PF path. That one should be safe as it cannot be a target for THP. Not that it would matter much because such a dependency chain would be really subtle. -- Michal Hocko SUSE Labs