> On Apr 3, 2018, at 2:12 PM, Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote: > > On Tue, Apr 03, 2018 at 01:49:25PM -0700, Buddy Lumpkin wrote: >>> Yes, very much this. If you have a single-threaded workload which is >>> using the entirety of memory and would like to use even more, then it >>> makes sense to use as many CPUs as necessary getting memory out of its >>> way. If you have N CPUs and N-1 threads happily occupying themselves in >>> their own reasonably-sized working sets with one monster process trying >>> to use as much RAM as possible, then I'd be pretty unimpressed to see >>> the N-1 well-behaved threads preempted by kswapd. >> >> The default value provides one kswapd thread per NUMA node, the same >> it was without the patch. Also, I would point out that just because you devote >> more threads to kswapd, doesn’t mean they are busy. If multiple kswapd threads >> are busy, they are almost certainly doing work that would have resulted in >> direct reclaims, which are often substantially more expensive than a couple >> extra context switches due to preemption. > > [...] > >> In my previous response to Michal Hocko, I described >> how I think we could scale watermarks in response to direct reclaims, and >> launch more kswapd threads when kswapd peaks at 100% CPU usage. > > I think you're missing my point about the workload ... kswapd isn't > "nice", so it will compete with the N-1 threads which are chugging along > at 100% CPU inside their working sets. If the memory hog is generating enough demand for multiple kswapd tasks to be busy, then it is generating enough demand to trigger direct reclaims. Since direct reclaims are 100% CPU bound, the preemptions you are concerned about are happening anyway. > In this scenario, we _don't_ > want to kick off kswapd at all; we want the monster thread to clean up > its own mess. This makes direct reclaims sound like a positive thing overall and that is simply not the case. If cleaning is the metaphor to describe direct reclaims, then it’s happening in the kitchen using a garden hose. When conditions for direct reclaims are present they can occur in any task that is allocating on the system. They inject latency in random places and they decrease filesystem throughput. When software engineers try to build their own cache, I usually try to talk them out of it. This rarely works, as they usually have reasons they believe make the project compelling, so I just ask that they compare their results using direct IO and a private cache to simply allowing the page cache to do it’s thing. I can’t make this pitch any more because direct reclaims have too much of an impact on filesystem throughput. The only positive thing that direct reclaims provide is a means to prevent the system from crashing or deadlocking when it falls too low on memory. > If we have idle CPUs, then yes, absolutely, lets have > them clean up for the monster, but otherwise, I want my N-1 threads > doing their own thing. > > Maybe we should renice kswapd anyway ... thoughts? We don't seem to have > had a nice'd kswapd since 2.6.12, but maybe we played with that earlier > and discovered it was a bad idea? >