On 13.11.24 17:01, Vlastimil Babka wrote:
When a combination of unfortunate factors occur, we might BUG in fork():
dup_mmap()
copy_page_range()
copy_***_range()
copy_present_pte()
copy_present_page()
page_add_new_anon_rmap()
__page_set_anon_rmap()
BUG_ON(!anon_vma);
The factors are:
- source vma is VM_MIXEDMAP otherwise copy_page_range() would bail out
when !src_vma->anon_vma
- I think this was due to gpfs, but can happen in-tree as well
- is_cow_mapping() is true because VM_MAYWRITE (even though the vma
was a read-only mapping of a .so file)
- MMF_HAS_PINNED is true, thus some actual pinning has happened
- page_maybe_dma_pinned() is true as a false positive, because mapcount
and thus refcount is >1024
That makes us reach page_needs_cow_for_dma() in copy_present_page() and
evaluate it as true and attempt to CoW a file page and hit the BUG_ON()
because we never had a reason to instantiate anon_vma for the source
vma.
AFAICS this was fixed inadvertedly in 5.19 by commit fb3d824d1a46
("mm/rmap: split page_dup_rmap() into page_dup_file_rmap() and
page_try_dup_anon_rmap()") or another commit in that series. What caught
my attention is this part of the changelog:
We really only care about pins on anonymous pages, because they are prone
to getting replaced in the COW handler once mapped R/O. For !anon pages
in cow-mappings (!VM_SHARED && VM_MAYWRITE) we shouldn't really care about
that, at least not that I could come up with an example.
And as part of that commit, an PageAnon() test is added in
copy_present_pte().
But the code is already refactored a lot, so this is an attempt at a
minimal fix for LTS kernels by placing the PageAnon() check to
copy_present_page().
Fixes: 70e806e4e645 ("mm: Do early cow for pinned pages during fork() for ptes")
Cc: Peter Xu <peterx@xxxxxxxxxx>
Cc: David Hildenbrand <david@xxxxxxxxxx>
Signed-off-by: Vlastimil Babka <vbabka@xxxxxxx>
---
Hi, we've seen this in our 5.14 based kernel and it involved the out of
tree gpfs module, but I believe the same thing can happen in LTS's 5.10
and 5.15 without out of tree modules as well. So I'd like your opinion
on this fix before I propose it to stable as a non-standard
version-specific fix (I don't think we'd want to backport fb3d824d1a46
with prerequisities). Thanks.
I recall seeing+discussing this exact patch already a couple years ago :D
Ah, here is the 5.15 version
https://lkml.kernel.org/r/20221028075244.3112566-1-songyuanzheng@xxxxxxxxxx
And the 5.10 version
https://lore.kernel.org/lkml/20221024094911.3054769-1-songyuanzheng@xxxxxxxxxx/
... I could have sworn they got applied.
... and in linux-5.10.y I see
commit 935a8b6202101d7f58fe9cd11287f9cec0d8dd32
Author: Yuanzheng Song <songyuanzheng@xxxxxxxxxx>
Date: Fri Oct 28 03:07:05 2022 +0000
mm/memory: add non-anonymous page check in the copy_present_page()
The vma->anon_vma of the child process may be NULL because
the entire vma does not contain anonymous pages. In this
case, a BUG will occur when the copy_present_page() passes
a copy of a non-anonymous page of that vma to the
page_add_new_anon_rmap() to set up new anonymous rmap.
Maybe you missed that the PageAnon() check is simply a couple of lines
further down in there?
--
Cheers,
David / dhildenb