On Wed, Feb 23, 2022 at 10:50:09PM -0500, Theodore Ts'o wrote: > On Thu, Feb 24, 2022 at 12:48:42PM +1100, Dave Chinner wrote: > > > Fair enough; on the other hand, we could also view this as making ext4 > > > more robust against buggy code in other subsystems, and while other > > > file systems may be losing user data if they are actually trying to do > > > remote memory access to file-backed memory, apparently other file > > > systems aren't noticing and so they're not crashing. > > > > Oh, we've noticed them, no question about that. We've got bug > > reports going back years for systems being crashed, triggering BUGs > > and/or corrupting data on both XFS and ext4 filesystems due to users > > trying to run RDMA applications with file backed pages. > > Is this issue causing XFS to crash? I didn't know that. I have no idea if crashes nowdays - go back a few years before and search for XFS BUGging out in ->invalidate_page (or was it ->release_page?) because of unexpected dirty pages. I think it could also trigger BUGs in writeback when ->writepages tripped over a dirty page without a delayed allocation mapping over the hole... We were pretty aggressive about telling people reporting such issues that they get to keep all the borken bits to themselves and to stop wasting our time with unsolvable problems caused by their broken-by-design RDMA applications. Hence people have largely stopped bothering us with random filesystem crashes on systems using RDMA on file-backed pages... > I tried the Syzbot reproducer with XFS mounted, and it didn't trigger > any crashes. I'm sure data was getting corrupted, but I figured I > should bring ext4 to the XFS level of "at least we're not reliably > killing the kernel". Oh, well, good to know XFS didn't die a horrible death immediately. Thanks for checking, Ted. > On ext4, an unprivileged process can use process_vm_writev(2) to crash > the system. I don't know how quickly we can get a fix into mm/gup.c, > but if some other kernel path tries calling set_page_dirty() on a > file-backed page without first asking permission from the file system, > it seems to be nice if the file system doesn't BUG() --- as near as I > can tell, xfs isn't crashing in this case, but ext4 is. iomap is probably refusing to map holes for writepage - we've cleaned up most of the weird edge cases to return errors, so I'm guessing iomap is just ignoring such pages these days. Yeah, see iomap_writepage_map(): error = wpc->ops->map_blocks(wpc, inode, pos); if (error) break; if (WARN_ON_ONCE(wpc->iomap.type == IOMAP_INLINE)) continue; if (wpc->iomap.type == IOMAP_HOLE) continue; Yeah, so if writeback maps a hole rather than converts a delalloc region to IOMAP_MAPPED, it'll just skip over the block/page. IIRC, they essentially become uncleanable pages, and I think eventually inode reclaim will just toss them out of memory. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx