On Sun 05-07-20 11:52:32, Qian Cai wrote: > On Sun, Jul 05, 2020 at 08:58:54PM +0800, Feng Tang wrote: > > On Sun, Jul 05, 2020 at 08:15:03AM -0400, Qian Cai wrote: > > > > > > > > > > On Jul 5, 2020, at 12:45 AM, Feng Tang <feng.tang@xxxxxxxxx> wrote: > > > > > > > > I did reproduce the problem, and from the debugging, this should > > > > be the same root cause as lore.kernel.org/lkml/20200526181459.GD991@xxxxxx/ > > > > that loosing the batch cause some accuracy problem, and the solution of > > > > adding some sync is still needed, which is dicussed in > > > > > > Well, before taking any of those patches now to fix the regression, > > > we will need some performance data first. If it turned out the > > > original performance gain is no longer relevant anymore due to this > > > regression fix on top, it is best to drop this patchset and restore > > > that VM_WARN_ONCE, so you can retry later once you found a better > > > way to optimize. > > > > The fix of adding sync only happens when the memory policy is being > > changed to OVERCOMMIT_NEVER, which is not a frequent operation in > > normal cases. > > > > For the performance improvment data both in commit log and 0day report > > https://lore.kernel.org/lkml/20200622132548.GS5535@shao2-debian/ > > it is for the will-it-scale's mmap testcase, which will not runtime > > change memory overcommit policy, so the data should be still valid > > with this fix. > > Well, I would expect people are perfectly reasonable to use > OVERCOMMIT_NEVER for some workloads making it more frequent operations. Would you have any examples? Because I find this highly unlikely. OVERCOMMIT_NEVER only works when virtual memory is not largerly overcommited wrt to real memory demand. And that tends to be more of an exception rather than a rule. "Modern" userspace (whatever that means) tends to be really hungry with virtual memory which is only used very sparsely. I would argue that either somebody is running an "OVERCOMMIT_NEVER" friendly SW and this is a permanent setting or this is not used at all. At least this is my experience. So I strongly suspect that LTP test failure is not something we should really lose sleep over. It would be nice to find a way to flush existing batches but I would rather see a real workload that would suffer from this imprecision. On the other hand perf. boost with larger batches with defualt overcommit setting sounds like a nice improvement to have. -- Michal Hocko SUSE Labs