[fixed the subject, not sure what happened there] FWIW I'm not sure fail-fail is always the right strategy here, in many cases even with some reclaim, compaction may win. Just not if you're on a tight budget for the latencies. > I stress test and measure XFS metadata performance under sustained > memory pressure all the time. This change has not caused any > obvious regressions in the short time I've been testing it. Did you test for tail latencies? There are some relatively simple ways to trigger memory fragmentation, the standard way is to allocate a very large THP backed file and then punch a lot of holes. > > I still need to do perf testing on large directory block sizes. That > is where high-order allocations will get stressed - that's where > xlog_kvmalloc() starts dominating the profiles as it trips over > vmalloc scalability issues... Yes that's true. vmalloc has many issues, although with the recent patches to split the rbtrees with separate locks it may now look quite different than before. > > > I would in any case add a tunable for it in case people run into this. > > No tunables. It either works or it doesn't. If we can't make > it work reliably by default, we throw it in the dumpster, light it > on fire and walk away. I'm not sure there is a single definition of "reliably" here -- for many workloads tail latencies don't matter, so it's always reliable, as long as you have good aggregate throughput. Others have very high expectations for them. Forcing the high expectations on everyone is probably not a good general strategy though, as there are general trade offs. I could see that having lots of small tunables for every use might not be a good idea. Perhaps there would be a case for a single general tunable that controls higher order folios for everyone. > > > Tail latencies are a common concern on many IO workloads. > > Yes, for user data operations it's a common concern. For metadata, > not so much - there's so many far worse long tail latencies in > metadata operations (like waiting for journal space) that memory > allocation latencies in the metadata IO path are largely noise.... I've seen pretty long stalls in the past. The difference to the journal is also that it is local the file system, while the memory is normally shared with everyone on the node or system. So the scope of noisy neighbour impact can be quite different, especially on a large machine. -Andi