On Tue, Aug 18, 2020 at 05:02:32PM +0200, Michal Hocko wrote: > On Tue 18-08-20 06:53:27, Paul E. McKenney wrote: > > On Tue, Aug 18, 2020 at 09:43:44AM +0200, Michal Hocko wrote: > > > On Mon 17-08-20 15:28:03, Paul E. McKenney wrote: > > > > On Mon, Aug 17, 2020 at 10:28:49AM +0200, Michal Hocko wrote: > > > > > On Mon 17-08-20 00:56:55, Uladzislau Rezki wrote: > > > > > > > > [ . . . ] > > > > > > > > > > wget ftp://vps418301.ovh.net/incoming/1000000_kmalloc_kfree_rcu_proc_percpu_pagelist_fractio_is_8.png > > > > > > > > > > 1/8 of the memory in pcp lists is quite large and likely not something > > > > > used very often. > > > > > > > > > > Both these numbers just make me think that a dedicated pool of page > > > > > pre-allocated for RCU specifically might be a better solution. I still > > > > > haven't read through that branch of the email thread though so there > > > > > might be some pretty convincing argments to not do that. > > > > > > > > To avoid the problematic corner cases, we would need way more dedicated > > > > memory than is reasonable, as in well over one hundred pages per CPU. > > > > Sure, we could choose a smaller number, but then we are failing to defend > > > > against flooding, even on systems that have more than enough free memory > > > > to be able to do so. It would be better to live within what is available, > > > > taking the performance/robustness hit only if there isn't enough. > > > > > > Thomas had a good point that it doesn't really make much sense to > > > optimize for flooders because that just makes them more effective. > > > > The point is not to make the flooders go faster, but rather for the > > system to be robust in the face of flooders. Robust as in harder for > > a flooder to OOM the system. > > Do we see this to be a practical problem? I am really confused because > the initial argument was revolving around an optimization now you are > suggesting that this is actually system stability measure. And I fail to > see how allowing an easy way to deplete pcp caches completely solves > any of that. Please do realize that if allow that then every user who > relies on pcp caches will have to take a slow(er) path and that will > have performance consequences. The pool is a global and a scarce > resource. That's why I've suggested a dedicated preallocated pool and > use it instead of draining global pcp caches. Both the optimization and the robustness are important. The problem with this thing is that I have to start describing it somewhere, and I have not yet come up with a description of the whole thing that isn't TL;DR. > > And reducing the number of post-grace-period cache misses makes it > > easier for the callback-invocation-time memory freeing to keep up with > > the flooder, thus avoiding (or at least delaying) the OOM. > > > > > > My current belief is that we need a combination of (1) either the > > > > GFP_NOLOCK flag or Peter Zijlstra's patch and > > > > > > I must have missed the patch? > > > > If I am keeping track, this one: > > > > https://lore.kernel.org/lkml/20200814215206.GL3982@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/ > > OK, I have certainly noticed that one but didn't react but my response > would be similar to the dedicated gfp flag. This is less of a hack than > __GFP_NOLOCK but it still exposes very internal parts of the allocator > and I find that a quite problematic from the future maintenance of the > allocator. The risk of an easy depletion of the pcp pool is also there > of course. I had to ask. ;-) Thanx, Paul