Re: [syzbot] [mm?] BUG: Bad page map (8)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 20.07.24 07:02, Hillf Danton wrote:
On Fri, 19 Jul 2024 13:21:30 +0200 David Hildenbrand <david@xxxxxxxxxx>
On 19.07.24 00:51, syzbot wrote:
Hello,

syzbot found the following issue on:

HEAD commit:    4d145e3f830b Merge tag 'i2c-for-6.10-rc8' of git://git.ker..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=11321495980000
kernel config:  https://syzkaller.appspot.com/x/.config?x=6b5a15443200e31
dashboard link: https://syzkaller.appspot.com/bug?extid=ec4b7d82bb051330f15a
compiler:       aarch64-linux-gnu-gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
userspace arch: arm64
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=113e054e980000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=1366ab85980000


The reproducer involves udmabuf. I suspect it has to do with it.

But I'm curius, does the reproducer not trigger before 4d145e3f830b on
mainliny?

Viveks changes are not upstream yet, but I can only speculate that we
have some issue similar to the one we had with hugetlb: udmabuf doing
things with memfd/shmem pages that it shouldn't do, because it doesn't
"own" these pages.

"udmabuf: Use vmf_insert_pfn and VM_PFNMAP for handling mmap" might help.

	cpu1				cpu2
	---				---
	evict()				find folio2 in page cache
	truncate_inode_folio()
	truncate_cleanup_folio();
	  // unmap folio2 from mmA
	  unmap_mapping_folio(folio2);
					mmap folio2 to mmB
	filemap_remove_folio(folio2);


If the window exists for mapping folio to userspace while indoe is evicted,
is this report false positive?

I think what happens here is that filemap_unaccount_folio() will force the mapcount to be logically 0 (value -1).

And if we then actually go ahead and unmap that folio from our udmabuf page tables, we will let it go negative (and also free up the refcount too early) resulting in all kinds of issues.

filemap_unaccount_folio() was written under the assumption that the mapcount will only get modified when we map something via the pagecache, not when some other code (udmabuf) looked up something from the pagecache and then maps it to user space itself.

"Fortunately", the issue only exists with CONFIG_DEBUG_VM.

The right fix is probably to stop udmabuf from touching the mapcount (use a PFNMAP as that patch does). Another fix would be removing that debugging code from filemap_unaccount_folio().

I do see value in part of that debugging code. The refcount+mapcount modifications, not so much. But the "BUG: Bad page cache in process ..." message sounds helpful.

--
Cheers,

David / dhildenb





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux