On 20.07.24 07:02, Hillf Danton wrote:
On Fri, 19 Jul 2024 13:21:30 +0200 David Hildenbrand <david@xxxxxxxxxx>
On 19.07.24 00:51, syzbot wrote:
Hello,
syzbot found the following issue on:
HEAD commit: 4d145e3f830b Merge tag 'i2c-for-6.10-rc8' of git://git.ker..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=11321495980000
kernel config: https://syzkaller.appspot.com/x/.config?x=6b5a15443200e31
dashboard link: https://syzkaller.appspot.com/bug?extid=ec4b7d82bb051330f15a
compiler: aarch64-linux-gnu-gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
userspace arch: arm64
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=113e054e980000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1366ab85980000
The reproducer involves udmabuf. I suspect it has to do with it.
But I'm curius, does the reproducer not trigger before 4d145e3f830b on
mainliny?
Viveks changes are not upstream yet, but I can only speculate that we
have some issue similar to the one we had with hugetlb: udmabuf doing
things with memfd/shmem pages that it shouldn't do, because it doesn't
"own" these pages.
"udmabuf: Use vmf_insert_pfn and VM_PFNMAP for handling mmap" might help.
cpu1 cpu2
--- ---
evict() find folio2 in page cache
truncate_inode_folio()
truncate_cleanup_folio();
// unmap folio2 from mmA
unmap_mapping_folio(folio2);
mmap folio2 to mmB
filemap_remove_folio(folio2);
If the window exists for mapping folio to userspace while indoe is evicted,
is this report false positive?
I think what happens here is that filemap_unaccount_folio() will force
the mapcount to be logically 0 (value -1).
And if we then actually go ahead and unmap that folio from our udmabuf
page tables, we will let it go negative (and also free up the refcount
too early) resulting in all kinds of issues.
filemap_unaccount_folio() was written under the assumption that the
mapcount will only get modified when we map something via the pagecache,
not when some other code (udmabuf) looked up something from the
pagecache and then maps it to user space itself.
"Fortunately", the issue only exists with CONFIG_DEBUG_VM.
The right fix is probably to stop udmabuf from touching the mapcount
(use a PFNMAP as that patch does). Another fix would be removing that
debugging code from filemap_unaccount_folio().
I do see value in part of that debugging code. The refcount+mapcount
modifications, not so much. But the "BUG: Bad page cache in process ..."
message sounds helpful.
--
Cheers,
David / dhildenb