On Tue 11-06-24 09:55:23, Byungchul Park wrote: > On Mon, Jun 10, 2024 at 03:23:49PM +0200, Michal Hocko wrote: > > On Tue 04-06-24 09:34:48, Byungchul Park wrote: > > > On Mon, Jun 03, 2024 at 06:01:05PM +0100, Matthew Wilcox wrote: > > > > On Mon, Jun 03, 2024 at 09:37:46AM -0700, Dave Hansen wrote: > > > > > Yeah, we'd need some equivalent of a PTE marker, but for the page cache. > > > > > Presumably some xa_value() that means a reader has to go do a > > > > > luf_flush() before going any farther. > > > > > > > > I can allocate one for that. We've got something like 1000 currently > > > > unused values which can't be mistaken for anything else. > > > > > > > > > That would actually have a chance at fixing two issues: One where a new > > > > > page cache insertion is attempted. The other where someone goes to look > > > > > in the page cache and takes some action _because_ it is empty (I think > > > > > NFS is doing some of this for file locks). > > > > > > > > > > LUF is also pretty fundamentally built on the idea that files can't > > > > > change without LUF being aware. That model seems to work decently for > > > > > normal old filesystems on normal old local block devices. I'm worried > > > > > about NFS, and I don't know how seriously folks take FUSE, but it > > > > > obviously can't work well for FUSE. > > > > > > > > I'm more concerned with: > > > > > > > > - page goes back to buddy > > > > - page is allocated to slab > > > > > > At this point, tlb flush needed will be performed in prep_new_page(). > > > > But that does mean that an unaware caller would get an additional > > overhead of the flushing, right? I think it would be just a matter of > > pcp for locality is already a better source of side channel attack. FYI, > tlb flush gets barely performed only if pending tlb flush exists. Right but rare and hard to predict latencies are much worse than consistent once. > > time before somebody can turn that into a side channel attack, not to > > mention unexpected latencies introduced. > > Nope. The pending tlb flush performed in prep_new_page() is the one > that would've done already with the vanilla kernel. It's not additional > tlb flushes but it's subset of all the skipped ones. But those skipped once could have happened in a completely different context (e.g. a different process or even a diffrent security domain), right? > It's worth noting all the existing mm reclaim mechaisms have already > introduced worse unexpected latencies. Right, but a reclaim, especially direct reclaim, are expected to be slow. It is much different to see spike latencies on system with a lot of memory. -- Michal Hocko SUSE Labs