Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting for once, to make this easily accessible to everyone. Chris et. al., was that fix from Yu ever submitted? From here it looks like fixing this regression fell through the cracks; but at the same time I have this strange feeling that I'm missing something obvious here and will look stupid by writing this mail... If that's the case: sorry for the noise. Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) On 04.08.24 21:11, Chris Li wrote: > On Sun, Aug 4, 2024 at 10:51 AM Chris Li <chrisl@xxxxxxxxxx> wrote: >> On Sun, Aug 4, 2024 at 5:22 AM Kairui Song <ryncsn@xxxxxxxxx> wrote: > >>>> Hi Yu, I tested your patch, on my system, the OOM still exists (96 >>>> core and 256G RAM), test memcg is limited to 512M and 32 thread (). >>>> >>>> And I found the OOM seems irrelevant to either your patch or Ge's >>>> patch. (it may changed the OOM chance slight though) >>>> >>>> After the very quick OOM (it failed to untar the linux source code), >>>> checking lru_gen_full: >>>> memcg 47 /build-kernel-tmpfs >>>> node 0 >>>> 442 1691 29405 0 >>>> 0 0r 0e 0p 57r >>>> 617e 0p >>>> 1 0r 0e 0p 0r >>>> 4e 0p >>>> 2 0r 0e 0p 0r >>>> 0e 0p >>>> 3 0r 0e 0p 0r >>>> 0e 0p >>>> 0 0 0 0 >>>> 0 0 >>>> 443 1683 57748 832 >>>> 0 0 0 0 0 >>>> 0 0 >>>> 1 0 0 0 0 >>>> 0 0 >>>> 2 0 0 0 0 >>>> 0 0 >>>> 3 0 0 0 0 >>>> 0 0 >>>> 0 0 0 0 >>>> 0 0 >>>> 444 1670 30207 133 >>>> 0 0 0 0 0 >>>> 0 0 >>>> 1 0 0 0 0 >>>> 0 0 >>>> 2 0 0 0 0 >>>> 0 0 >>>> 3 0 0 0 0 >>>> 0 0 >>>> 0 0 0 0 >>>> 0 0 >>>> 445 1662 0 0 >>>> 0 0R 34T 0 57R >>>> 238T 0 >>>> 1 0R 0T 0 0R >>>> 0T 0 >>>> 2 0R 0T 0 0R >>>> 0T 0 >>>> 3 0R 0T 0 0R >>>> 81T 0 >>>> 13807L 324O 867Y 2538N >>>> 63F 18A >>>> >>>> If I repeat the test many times, it may succeed by chance, but the >>>> untar process is very slow and generates about 7000 generations. >>>> >>>> But if I change the untar cmdline to: >>>> python -c "import sys; sys.stdout.buffer.write(open('$linux_src', >>>> mode='rb').read())" | tar zx >>>> >>>> Then the problem is gone, it can untar the file successfully and very fast. >>>> >>>> This might be a different issue reported by Chris, I'm not sure. >>> >>> After more testing, I think these are two problems (note I changed the >>> memcg limit to 600m later so the compile test can run smoothly). >>> >>> 1. OOM during the untar progress (can be workarounded by the untar >>> cmdline I mentioned above). >> >> There are two different issues here. >> My recent test script has moved the untar phase out of memcg limit >> (mostly I want to multithreading untar) so the bisect I did is only >> catch the second one. >> The untar issue might not be a regression from this patch. >> >>> 2. OOM during the compile progress (this should be the one Chris encountered). >>> >>> Both 1 and 2 only exist for MGLRU. >>> 1 can be workarounded using the cmdline I mentioned above. >>> 2 is caused by Ge's patch, and 1 is not. >>> >>> I can confirm Yu's patch fixed 2 on my system, but the 1 seems still a >>> problem, it's not related to this patch, maybe can be discussed >>> elsewhere. >> >> I will do a test run now with Yu's patch and report back. > > Confirm Yu's patch fixes the regression for me. Now it can sustain > 470M pressure without causing OOM kill. > > Yu, please submit your patch. This regression has merged into Linus' > tree already. > > Feel free to add: > > Tested-by: Chris Li <chrisl@xxxxxxxxxx> > > Chris > -- Everything you wanna know about Linux kernel regression tracking: https://linux-regtracking.leemhuis.info/about/#tldr If I did something stupid, please tell me, as explained on that page. #regzbot poke