On Sat, Aug 3, 2024 at 2:31 AM Ge Yang <yangge1116@xxxxxxx> wrote: > > > > 在 2024/8/3 4:18, Chris Li 写道: > > On Thu, Aug 1, 2024 at 6:56 PM Ge Yang <yangge1116@xxxxxxx> wrote: > >> > >> > >> > >>>> I can't reproduce this problem, using tmpfs to compile linux. > >>>> Seems you limit the memory size used to compile linux, which leads to > >>>> OOM. May I ask why the memory size is limited to 481280kB? Do I also > >>>> need to limit the memory size to 481280kB to test? > >>> > >>> Yes, you need to limit the cgroup memory size to force the swap > >>> action. I am using memory.max = 470M. > >>> > >>> I believe other values e.g. 800M can trigger it as well. The reason to > >>> limit the memory to cause the swap action. > >>> The goal is to intentionally overwhelm the memory load and let the > >>> swap system do its job. The 470M is chosen to cause a lot of swap > >>> action but not too high to cause OOM kills in normal kernels. > >>> In another word, high enough swap pressure but not too high to bust > >>> into OOM kill. e.g. I verify that, with your patch reverted, the > >>> mm-stable kernel can sustain this level of swap pressure (470M) > >>> without OOM kill. > >>> > >>> I borrowed the 470M magic value from Hugh and verified it works with > >>> my test system. Huge has a similar swab test up which is more > >>> complicated than mine. It is the inspiration of my swap stress test > >>> setup. > >>> > >>> FYI, I am using "make -j32" on a machine with 12 cores (24 > >>> hyperthreading). My typical swap usage is about 3-5G. I set my > >>> swapfile size to about 20G. > >>> I am using zram or ssd as the swap backend. Hope that helps you > >>> reproduce the problem. > >>> > >> Hi Chris, > >> > >> I try to construct the experiment according to your suggestions above. > > > > Hi Ge, > > > > Sorry to hear that you were not able to reproduce it. > > > >> High swap pressure can be triggered, but OOM can't be reproduced. The > >> specific steps are as follows: > >> root@ubuntu-server-2204:/home/yangge# cp workspace/linux/ /dev/shm/ -rf > > > > I use a slightly different way to setup the tmpfs: > > > > Here is section of my script: > > > > if ! [ -d $tmpdir ]; then > > sudo mkdir -p $tmpdir > > sudo mount -t tmpfs -o size=100% nodev $tmpdir > > fi > > > > sudo mkdir -p $cgroup > > sudo sh -c "echo $mem > $cgroup/memory.max" || echo setup > > memory.max error > > sudo sh -c "echo 1 > $cgroup/memory.oom.group" || echo setup > > oom.group error > > > > Per run: > > > > # $workdir is under $tmpdir > > sudo rm -rf $workdir > > mkdir -p $workdir > > cd $workdir > > echo "Extracting linux tree" > > XZ_OPT='-T0 -9 –memory=75%' tar xJf $linux_src || die "xz > > extract failed" > > > > sudo sh -c "echo $BASHPID > $cgroup/cgroup.procs" > > echo "Cleaning linux tree, setup defconfig" > > cd $workdir/linux > > make -j$NR_TASK clean > > make defconfig > /dev/null > > echo Kernel compile run $i > > /usr/bin/time -a -o $log make --silent -j$NR_TASK || die "make failed" > > > > > Thanks. > > >> root@ubuntu-server-2204:/home/yangge# sync > >> root@ubuntu-server-2204:/home/yangge# echo 3 > /proc/sys/vm/drop_caches > >> root@ubuntu-server-2204:/home/yangge# cd /sys/fs/cgroup/ > >> root@ubuntu-server-2204:/sys/fs/cgroup/# mkdir kernel-build > >> root@ubuntu-server-2204:/sys/fs/cgroup/# cd kernel-build > >> root@ubuntu-server-2204:/sys/fs/cgroup/kernel-build# echo 470M > memory.max > >> root@ubuntu-server-2204:/sys/fs/cgroup/kernel-build# echo $$ > cgroup.procs > >> root@ubuntu-server-2204:/sys/fs/cgroup/kernel-build# cd /dev/shm/linux/ > >> root@ubuntu-server-2204:/dev/shm/linux# make clean && make -j24 > > > > I am using make -j 32. > > > > Your step should work. > > > > Did you enable MGLRU in your .config file? Mine did. I attached my > > config file here. > > > > The above test didn't enable MGLRU. > > When MGLRU is enabled, I can reproduce OOM very soon. The cause of > triggering OOM is being analyzed. I think this is one of the potential side effects -- Huge mentioned earlier about isolate_lru_folios(): https://lore.kernel.org/linux-mm/503f0df7-91e8-07c1-c4a6-124cad9e65e7@xxxxxxxxxx/ Try this: diff --git a/mm/vmscan.c b/mm/vmscan.c index cfa839284b92..778bf5b7ef97 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -4320,7 +4320,7 @@ static bool sort_folio(struct lruvec *lruvec, struct folio *folio, struct scan_c } /* ineligible */ - if (zone > sc->reclaim_idx || skip_cma(folio, sc)) { + if (!folio_test_lru(folio) || zone > sc->reclaim_idx || skip_cma(folio, sc)) { gen = folio_inc_gen(lruvec, folio, false); list_move_tail(&folio->lru, &lrugen->folios[gen][type][zone]); return true; > >> Please help to see which step does not meet your requirements. > > > > How many cores does your server have? I assume your RAM should be > > plenty on that server. > > > > My server has 64 cores (128 hyperthreading) and 160G of RAM.