Re: [PATCH V2] mm/gup: Clear the LRU flag of a page before adding to LRU batch

"Linux regression tracking (Thorsten Leemhuis)" <regressions@xxxxxxxxxxxxx> · Mon, 2 Sep 2024 14:53:55 +0200

Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting
for once, to make this easily accessible to everyone.

Chris et. al., was that fix from Yu ever submitted? From here it looks
like fixing this regression fell through the cracks; but at the same
time I have this strange feeling that I'm missing something obvious here
and will look stupid by writing this mail... If that's the case: sorry
for the noise.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)

On 04.08.24 21:11, Chris Li wrote:
> On Sun, Aug 4, 2024 at 10:51 AM Chris Li <chrisl@xxxxxxxxxx> wrote:
>> On Sun, Aug 4, 2024 at 5:22 AM Kairui Song <ryncsn@xxxxxxxxx> wrote:
>
>>>> Hi Yu, I tested your patch, on my system, the OOM still exists (96
>>>> core and 256G RAM), test memcg is limited to 512M and 32 thread ().
>>>>
>>>> And I found the OOM seems irrelevant to either your patch or Ge's
>>>> patch. (it may changed the OOM chance slight though)
>>>>
>>>> After the very quick OOM (it failed to untar the linux source code),
>>>> checking lru_gen_full:
>>>> memcg    47 /build-kernel-tmpfs
>>>>  node     0
>>>>         442       1691      29405           0
>>>>                      0          0r          0e          0p         57r
>>>>        617e          0p
>>>>                      1          0r          0e          0p          0r
>>>>          4e          0p
>>>>                      2          0r          0e          0p          0r
>>>>          0e          0p
>>>>                      3          0r          0e          0p          0r
>>>>          0e          0p
>>>>                                 0           0           0           0
>>>>          0           0
>>>>         443       1683      57748         832
>>>>                      0          0           0           0           0
>>>>          0           0
>>>>                      1          0           0           0           0
>>>>          0           0
>>>>                      2          0           0           0           0
>>>>          0           0
>>>>                      3          0           0           0           0
>>>>          0           0
>>>>                                 0           0           0           0
>>>>          0           0
>>>>         444       1670      30207         133
>>>>                      0          0           0           0           0
>>>>          0           0
>>>>                      1          0           0           0           0
>>>>          0           0
>>>>                      2          0           0           0           0
>>>>          0           0
>>>>                      3          0           0           0           0
>>>>          0           0
>>>>                                 0           0           0           0
>>>>          0           0
>>>>         445       1662          0           0
>>>>                      0          0R         34T          0          57R
>>>>        238T          0
>>>>                      1          0R          0T          0           0R
>>>>          0T          0
>>>>                      2          0R          0T          0           0R
>>>>          0T          0
>>>>                      3          0R          0T          0           0R
>>>>         81T          0
>>>>                             13807L        324O        867Y       2538N
>>>>         63F         18A
>>>>
>>>> If I repeat the test many times, it may succeed by chance, but the
>>>> untar process is very slow and generates about 7000 generations.
>>>>
>>>> But if I change the untar cmdline to:
>>>> python -c "import sys; sys.stdout.buffer.write(open('$linux_src',
>>>> mode='rb').read())" | tar zx
>>>>
>>>> Then the problem is gone, it can untar the file successfully and very fast.
>>>>
>>>> This might be a different issue reported by Chris, I'm not sure.
>>>
>>> After more testing, I think these are two problems (note I changed the
>>> memcg limit to 600m later so the compile test can run smoothly).
>>>
>>> 1. OOM during the untar progress (can be workarounded by the untar
>>> cmdline I mentioned above).
>>
>> There are two different issues here.
>> My recent test script has moved the untar phase out of memcg limit
>> (mostly I want to multithreading untar) so the bisect I did is only
>> catch the second one.
>> The untar issue might not be a regression from this patch.
>>
>>> 2. OOM during the compile progress (this should be the one Chris encountered).
>>>
>>> Both 1 and 2 only exist for MGLRU.
>>> 1 can be workarounded using the cmdline I mentioned above.
>>> 2 is caused by Ge's patch, and 1 is not.
>>>
>>> I can confirm Yu's patch fixed 2 on my system, but the 1 seems still a
>>> problem, it's not related to this patch, maybe can be discussed
>>> elsewhere.
>>
>> I will do a test run now with Yu's patch and report back.
> 
> Confirm Yu's patch fixes the regression for me. Now it can sustain
> 470M pressure without causing OOM kill.
> 
> Yu, please submit your patch.  This regression has merged into Linus'
> tree already.
> 
> Feel free to add:
> 
> Tested-by: Chris Li <chrisl@xxxxxxxxxx>
> 
> Chris
> 

--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

#regzbot poke