On Mon, Jan 22, 2024 at 5:04 PM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote: > > I'm disappointed to have no reaction from netdev so far. Let's see if a > more exciting subject line evinces some interest. Hmm, perhaps some of us were enjoying their weekend ? I also see '[RFC PATCH] filemap: add mapping_mapped check in filemap_unaccount_folio()', and during the merge window, network maintainers tend to prioritize their work based on tags. If a stack trace was added, perhaps our attention would have been caught. I don't really know what changed recently, all I know is that TCP zero copy is for real network traffic. Real trafic uses order-0 pages, 4K at a time. If can_map_frag() needs to add another safety check, let's add it. syzbot is usually quite good at bisections, was a bug origin found ? > > On Sat, Jan 20, 2024 at 02:46:49PM +0800, zhangpeng (AS) wrote: > > On 2024/1/19 21:40, Matthew Wilcox wrote: > > > > > On Fri, Jan 19, 2024 at 05:20:24PM +0800, Peng Zhang wrote: > > > > Recently, we discovered a syzkaller issue that triggers > > > > VM_BUG_ON_FOLIO in filemap_unaccount_folio() with CONFIG_DEBUG_VM > > > > enabled, or bad page without CONFIG_DEBUG_VM. > > > > > > > > The specific scenarios are as follows: > > > > (1) mmap: Use socket fd to create a TCP VMA. > > > > (2) open(O_CREAT) + fallocate + sendfile: Read the ext4 file and create > > > > the page cache. The mapping of the page cache is ext4 inode->i_mapping. > > > > Send the ext4 page cache to the socket fd through sendfile. > > > > (3) getsockopt TCP_ZEROCOPY_RECEIVE: Receive the ext4 page cache and use > > > > vm_insert_pages() to insert the ext4 page cache to the TCP VMA. In this > > > > case, mapcount changes from - 1 to 0. The page cache mapping is ext4 > > > > inode->i_mapping, but the VMA of the page cache is the TCP VMA and > > > > folio->mapping->i_mmap is empty. > > > I think this is the bug. We shouldn't be incrementing the mapcount > > > in this scenario. Assuming we want to support doing this at all and > > > we don't want to include something like ... > > > > > > if (folio->mapping) { > > > if (folio->mapping != vma->vm_file->f_mapping) > > > return -EINVAL; > > > if (page_to_pgoff(page) != linear_page_index(vma, address)) > > > return -EINVAL; > > > } > > > > > > But maybe there's a reason for networking needing to map pages in this > > > scenario? > > > > Agreed, and I'm also curious why. > > > > > > (4) open(O_TRUNC): Deletes the ext4 page cache. In this case, the page > > > > cache is still in the xarray tree of mapping->i_pages and these page > > > > cache should also be deleted. However, folio->mapping->i_mmap is empty. > > > > Therefore, truncate_cleanup_folio()->unmap_mapping_folio() can't unmap > > > > i_mmap tree. In filemap_unaccount_folio(), the mapcount of the folio is > > > > 0, causing BUG ON. > > > > > > > > Syz log that can be used to reproduce the issue: > > > > r3 = socket$inet_tcp(0x2, 0x1, 0x0) > > > > mmap(&(0x7f0000ff9000/0x4000)=nil, 0x4000, 0x0, 0x12, r3, 0x0) > > > > r4 = socket$inet_tcp(0x2, 0x1, 0x0) > > > > bind$inet(r4, &(0x7f0000000000)={0x2, 0x4e24, @multicast1}, 0x10) > > > > connect$inet(r4, &(0x7f00000006c0)={0x2, 0x4e24, @empty}, 0x10) > > > > r5 = openat$dir(0xffffffffffffff9c, &(0x7f00000000c0)='./file0\x00', > > > > 0x181e42, 0x0) > > > > fallocate(r5, 0x0, 0x0, 0x85b8) > > > > sendfile(r4, r5, 0x0, 0x8ba0) > > > > getsockopt$inet_tcp_TCP_ZEROCOPY_RECEIVE(r4, 0x6, 0x23, > > > > &(0x7f00000001c0)={&(0x7f0000ffb000/0x3000)=nil, 0x3000, 0x0, 0x0, 0x0, > > > > 0x0, 0x0, 0x0, 0x0}, &(0x7f0000000440)=0x40) > > > > r6 = openat$dir(0xffffffffffffff9c, &(0x7f00000000c0)='./file0\x00', > > > > 0x181e42, 0x0) > > > > > > > > In the current TCP zerocopy scenario, folio will be released normally . > > > > When the process exits, if the page cache is truncated before the > > > > process exits, BUG ON or Bad page occurs, which does not meet the > > > > expectation. > > > > To fix this issue, the mapping_mapped() check is added to > > > > filemap_unaccount_folio(). In addition, to reduce the impact on > > > > performance, no lock is added when mapping_mapped() is checked. > > > NAK this patch, you're just preventing the assertion from firing. > > > I think there's a deeper problem here. > > > > -- > > Best Regards, > > Peng > > > >