On Wed, May 27, 2020 at 09:33:32PM +0800, Feng Tang wrote: > Hi Qian, > > On Wed, May 27, 2020 at 08:05:49AM -0400, Qian Cai wrote: > > On Wed, May 27, 2020 at 06:46:06PM +0800, Feng Tang wrote: > > > Hi Qian, > > > > > > On Tue, May 26, 2020 at 10:25:39PM -0400, Qian Cai wrote: > > > > > > > > [1] https://lkml.org/lkml/2020/3/5/57 > > > > > > > > > > > > > > Reverted this series fixed a warning under memory pressue. > > > > > > > > > > > > Andrew, Stephen, can you drop this series? > > > > > > > > > > > > > > > > > > > > [ 3319.257898] LTP: starting oom01 > > > > > > > [ 3319.284417] ------------[ cut here ]------------ > > > > > > > [ 3319.284439] memory commitment underflow > > > > > > > > > > Thanks for the catch! > > > > > > > > > > Could you share the info about the platform, like the CPU numbers > > > > > and RAM size, and what's the mmap test size of your test program. > > > > > It would be great if you can point me the link to the test program. > > > > > > > > I have been reproduced this on both AMD and Intel. The test just > > > > allocating memory and swapping. > > > > > > > > https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/mem/oom/oom01.c > > > > https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/mem/tunable/overcommit_memory.c > > > > > > > > It might be better to run the whole LTP mm tests if none of the above > > > > triggers it for you which has quite a few memory pressurers. > > > > > > > > /opt/ltp/runltp -f mm > > > > > > Thanks for sharing. I tried to reproduce this on 2 server plaforms, > > > but can't reproduce it, and they are still under testing. > > > > > > Meanwhile, could you help to try the below patch, which is based on > > > Andi's suggestion and have some debug info. The warning is a little > > > strange, as the condition is > > > > > > (percpu_counter_read(&vm_committed_as) < > > > -(s64)vm_committed_as_batch * num_online_cpus()) > > > > > > while for your platform (48 CPU + 128 GB RAM), the > > > '-(s64)vm_committed_as_batch * num_online_cpus()' > > > is a s64 value: '-32G', which makes the condition hard to be true, > > > and when it is, it could be triggered by some magic for s32/s64 > > > operations around the percpu-counter. > > > > Here is the information on AMD and powerpc below affected by this. It > > could need a bit patient to reproduce, but our usual daily CI would > > trigger it eventually after a few tries. > > > > # git clone https://github.com/cailca/linux-mm.git > > # cd linux-mm > > # ./compile.sh > > # systemctl reboot > > # ./test.sh > > I just downloaded it, and it failed on my desktop machine as it failed > in 'yum' and 'grub2' setup. The difficulty for me to reproduce is the > test platforms are behind the 0day framework, and I can hardly setup > external test suits, though I have been trying for all day today :) I tried your debug patch and it did not even compile on linux-next (where the issue was happened) and I am running out of time today. It probably need to reproduce on large systems as it did not happen on one of our small s390 system here.