On Tue 18-08-20 06:53:27, Paul E. McKenney wrote: > On Tue, Aug 18, 2020 at 09:43:44AM +0200, Michal Hocko wrote: > > On Mon 17-08-20 15:28:03, Paul E. McKenney wrote: > > > On Mon, Aug 17, 2020 at 10:28:49AM +0200, Michal Hocko wrote: > > > > On Mon 17-08-20 00:56:55, Uladzislau Rezki wrote: > > > > > > [ . . . ] > > > > > > > > wget ftp://vps418301.ovh.net/incoming/1000000_kmalloc_kfree_rcu_proc_percpu_pagelist_fractio_is_8.png > > > > > > > > 1/8 of the memory in pcp lists is quite large and likely not something > > > > used very often. > > > > > > > > Both these numbers just make me think that a dedicated pool of page > > > > pre-allocated for RCU specifically might be a better solution. I still > > > > haven't read through that branch of the email thread though so there > > > > might be some pretty convincing argments to not do that. > > > > > > To avoid the problematic corner cases, we would need way more dedicated > > > memory than is reasonable, as in well over one hundred pages per CPU. > > > Sure, we could choose a smaller number, but then we are failing to defend > > > against flooding, even on systems that have more than enough free memory > > > to be able to do so. It would be better to live within what is available, > > > taking the performance/robustness hit only if there isn't enough. > > > > Thomas had a good point that it doesn't really make much sense to > > optimize for flooders because that just makes them more effective. > > The point is not to make the flooders go faster, but rather for the > system to be robust in the face of flooders. Robust as in harder for > a flooder to OOM the system. Do we see this to be a practical problem? I am really confused because the initial argument was revolving around an optimization now you are suggesting that this is actually system stability measure. And I fail to see how allowing an easy way to deplete pcp caches completely solves any of that. Please do realize that if allow that then every user who relies on pcp caches will have to take a slow(er) path and that will have performance consequences. The pool is a global and a scarce resource. That's why I've suggested a dedicated preallocated pool and use it instead of draining global pcp caches. > And reducing the number of post-grace-period cache misses makes it > easier for the callback-invocation-time memory freeing to keep up with > the flooder, thus avoiding (or at least delaying) the OOM. > > > > My current belief is that we need a combination of (1) either the > > > GFP_NOLOCK flag or Peter Zijlstra's patch and > > > > I must have missed the patch? > > If I am keeping track, this one: > > https://lore.kernel.org/lkml/20200814215206.GL3982@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/ OK, I have certainly noticed that one but didn't react but my response would be similar to the dedicated gfp flag. This is less of a hack than __GFP_NOLOCK but it still exposes very internal parts of the allocator and I find that a quite problematic from the future maintenance of the allocator. The risk of an easy depletion of the pcp pool is also there of course. -- Michal Hocko SUSE Labs