On Fri, Feb 14, 2020 at 12:48:42PM -0800, Tyler Sanderson wrote: > Regarding Wei's patch that modifies the shrinker implementation, versus this > patch which reverts to OOM notifier: > I am in favor of both patches. But I do want to make sure a fix gets back > ported to 4.19 where the performance regression was first introduced. > My concern with reverting to the OOM notifier is, as mst@ put it (in the other > thread): > "when linux hits OOM all kind of error paths are being hit, latent bugs start > triggering, latency goes up drastically." > The guest could be in a lot of pain before the OOM notifier is invoked, and it > seems like the shrinker API might allow more fine grained control of when we > deflate. > > On the other hand, I'm not totally convinced that Wei's patch is an expected > use of the shrinker/page-cache APIs, and maybe it is fragile. Needs more > testing and scrutiny. > > It seems to me like the shrinker API is the right API in the long run, perhaps > with some fixes and modifications. But maybe reverting to OOM notifier is the > best patch to back port? In that case can I see some Tested-by reports pls? > On Fri, Feb 14, 2020 at 6:19 AM David Hildenbrand <david@xxxxxxxxxx> wrote: > > >> There was a report that this results in undesired side effects when > >> inflating the balloon to shrink the page cache. [1] > >> "When inflating the balloon against page cache (i.e. no free memory > >> remains) vmscan.c will both shrink page cache, but also invoke the > >> shrinkers -- including the balloon's shrinker. So the balloon > >> driver allocates memory which requires reclaim, vmscan gets this > >> memory by shrinking the balloon, and then the driver adds the > >> memory back to the balloon. Basically a busy no-op." > >> > >> The name "deflate on OOM" makes it pretty clear when deflation should > >> happen - after other approaches to reclaim memory failed, not while > >> reclaiming. This allows to minimize the footprint of a guest - memory > >> will only be taken out of the balloon when really needed. > >> > >> Especially, a drop_slab() will result in the whole balloon getting > >> deflated - undesired. > > > > Could you explain why some more? drop_caches shouldn't be really used in > > any production workloads and if somebody really wants all the cache to > > be dropped then why is balloon any different? > > > > Deflation should happen when the guest is out of memory, not when > somebody thinks it's time to reclaim some memory. That's what the > feature promised from the beginning: Only give the guest more memory in > case it *really* needs more memory. > > Deflate on oom, not deflate on reclaim/memory pressure. (that's what the > report was all about) > > A priority for shrinkers might be a step into the right direction. > > -- > Thanks, > > David / dhildenb > >