On Tue, 23 May 2023, Claudio Imbrenda wrote:
so if I understand the above correctly, pte_offset_map_lock will only fail if the whole page table has disappeared, and in that case, it will never reappear with zero pages, therefore we can safely skip (in that case just break). if we were to do a continue instead of a break, we would most likely fail again anyway.
Yes, that's the most likely; and you hold mmap_write_lock() there, and VM_NOHUGEPAGE on all vmas, so I think it's the only foreseeable possibility.
in that case I would still like a small change in your patch: please write a short (2~3 lines max) comment about why it's ok to do things that way
Sure. But I now see that I've disobeyed you, and gone to 4 lines (but in the comment above the function, so as not to distract from the code itself): is this good wording to you? I needed to research how they were stopped from coming in afterwards, so wanted to put something greppable in there. And, unless I'm misunderstanding, that "after THP was enabled" was always supposed to say "after THP was disabled" (because splitting a huge zero page pmd inserts a a page table full of little zero ptes). Or would you prefer the comment in the commit message instead, or down just above the pte_offset_map_lock() line? It would much better if I could find one place at the mm end, to enforce its end of the contract; but cannot think how to do that. Hugh --- a/arch/s390/mm/gmap.c +++ b/arch/s390/mm/gmap.c @@ -2537,7 +2537,12 @@ static inline void thp_split_mm(struct mm_struct *mm) * Remove all empty zero pages from the mapping for lazy refaulting * - This must be called after mm->context.has_pgste is set, to avoid * future creation of zero pages - * - This must be called after THP was enabled + * - This must be called after THP was disabled. + * + * mm contracts with s390, that even if mm were to remove a page table, + * racing with the loop below and so causing pte_offset_map_lock() to fail, + * it will never insert a page table containing empty zero pages once + * mm_forbids_zeropage(mm) i.e. mm->context.has_pgste is set. */ static int __zap_zero_pages(pmd_t *pmd, unsigned long start, unsigned long end, struct mm_walk *walk)