On Tue, 24 Sept 2024 at 02:28, Jan Kara <jack@xxxxxxx> wrote: > > On Mon 23-09-24 12:36:14, Linus Torvalds wrote: > > > > Do we really want to call that horrific fsnotify path for the case > > where we already have the page cached? This is a fairly critical > > fastpath, and not giving out page cache pages means that now you are > > literally violating mmap coherency. > > > > If the aim is to fill in caches on first access, then if we already > > have a page cache page, it's by definition not first access any more! > > Well, that's what actually should be happening. do_read_fault() will do > should_fault_around(vmf) -> yes -> do_fault_around() and > filemap_map_pages() will insert all pages in the page cache into the page > table page before we even get to filemap_fault() calling our fsnotify > hooks. That's the fault-around code, yes, and it will populate most pages on many filesystems, but it's still optional. Not all filesystems have a 'map_pages' function at all (from a quick grep at least ceph, erofs, ext2, ocfs2 - although I didn't actually validate that my quick grep was right). Look here - the part I disliked the most was literally commit 4f0ec01f45cd ("fsnotify: generate pre-content permission event on page fault") which at the very top of 'filemap_fault()' adds /* * If we have pre-content watchers then we need to generate events on * page fault so that we can populate any data before the fault. */ ret = __filemap_fsnotify_fault(vmf, &fpin); ... right *above* the code that then does /* * Do we have something in the page cache already? */ folio = filemap_get_folio(mapping, index); and this is all still in the fast-path for any filesystem that doesn't do map_pages. Do we care deeply about such filesystems? Perhaps not - but this all still smells to me. When the code that is supposed to ignore caches exists right *above* the code that has a big comment about "is it already in the page cache", it makes me unhappy. Linus