On Sun, Aug 4, 2024 at 5:22 AM Kairui Song <ryncsn@xxxxxxxxx> wrote: > > > Hi Yu, I tested your patch, on my system, the OOM still exists (96 > > core and 256G RAM), test memcg is limited to 512M and 32 thread (). > > > > And I found the OOM seems irrelevant to either your patch or Ge's > > patch. (it may changed the OOM chance slight though) > > > > After the very quick OOM (it failed to untar the linux source code), > > checking lru_gen_full: > > memcg 47 /build-kernel-tmpfs > > node 0 > > 442 1691 29405 0 > > 0 0r 0e 0p 57r > > 617e 0p > > 1 0r 0e 0p 0r > > 4e 0p > > 2 0r 0e 0p 0r > > 0e 0p > > 3 0r 0e 0p 0r > > 0e 0p > > 0 0 0 0 > > 0 0 > > 443 1683 57748 832 > > 0 0 0 0 0 > > 0 0 > > 1 0 0 0 0 > > 0 0 > > 2 0 0 0 0 > > 0 0 > > 3 0 0 0 0 > > 0 0 > > 0 0 0 0 > > 0 0 > > 444 1670 30207 133 > > 0 0 0 0 0 > > 0 0 > > 1 0 0 0 0 > > 0 0 > > 2 0 0 0 0 > > 0 0 > > 3 0 0 0 0 > > 0 0 > > 0 0 0 0 > > 0 0 > > 445 1662 0 0 > > 0 0R 34T 0 57R > > 238T 0 > > 1 0R 0T 0 0R > > 0T 0 > > 2 0R 0T 0 0R > > 0T 0 > > 3 0R 0T 0 0R > > 81T 0 > > 13807L 324O 867Y 2538N > > 63F 18A > > > > If I repeat the test many times, it may succeed by chance, but the > > untar process is very slow and generates about 7000 generations. > > > > But if I change the untar cmdline to: > > python -c "import sys; sys.stdout.buffer.write(open('$linux_src', > > mode='rb').read())" | tar zx > > > > Then the problem is gone, it can untar the file successfully and very fast. > > > > This might be a different issue reported by Chris, I'm not sure. > > After more testing, I think these are two problems (note I changed the > memcg limit to 600m later so the compile test can run smoothly). > > 1. OOM during the untar progress (can be workarounded by the untar > cmdline I mentioned above). There are two different issues here. My recent test script has moved the untar phase out of memcg limit (mostly I want to multithreading untar) so the bisect I did is only catch the second one. The untar issue might not be a regression from this patch. > 2. OOM during the compile progress (this should be the one Chris encountered). > > Both 1 and 2 only exist for MGLRU. > 1 can be workarounded using the cmdline I mentioned above. > 2 is caused by Ge's patch, and 1 is not. > > I can confirm Yu's patch fixed 2 on my system, but the 1 seems still a > problem, it's not related to this patch, maybe can be discussed > elsewhere. I will do a test run now with Yu's patch and report back. Chris