Re: [PATCH] mm, oom: Add lru_add_drain() in __oom_reap_task_mm()

Jianfeng Wang <jianfeng.w.wang@xxxxxxxxxx> · Thu, 11 Jan 2024 10:54:45 -0800

On 1/11/24 12:46 AM, Michal Hocko wrote:
> On Wed 10-01-24 11:02:03, Jianfeng Wang wrote:
>> On 1/10/24 12:46 AM, Michal Hocko wrote:
>>> On Tue 09-01-24 01:15:11, Jianfeng Wang wrote:
>>>> The oom_reaper tries to reclaim additional memory owned by the oom
>>>> victim. In __oom_reap_task_mm(), it uses mmu_gather for batched page
>>>> free. After oom_reaper was added, mmu_gather feature introduced
>>>> CONFIG_MMU_GATHER_NO_GATHER (in 'commit 952a31c9e6fa ("asm-generic/tlb:
>>>> Introduce CONFIG_HAVE_MMU_GATHER_NO_GATHER=y")', an option to skip batched
>>>> page free. If set, tlb_batch_pages_flush(), which is responsible for
>>>> calling lru_add_drain(), is skipped during tlb_finish_mmu(). Without it,
>>>> pages could still be held by per-cpu fbatches rather than be freed.
>>>>
>>>> This fix adds lru_add_drain() prior to mmu_gather. This makes the code
>>>> consistent with other cases where mmu_gather is used for freeing pages.
>>>
>>> Does this fix any actual problem or is this pure code consistency thing?
>>> I am asking because it doesn't make much sense to me TBH, LRU cache
>>> draining is usually important when we want to ensure that cached pages
>>> are put to LRU to be dealt with because otherwise the MM code wouldn't
>>> be able to deal with them. OOM reaper doesn't necessarily run on the
>>> same CPU as the oom victim so draining on a local CPU doesn't
>>> necessarily do anything for the victim's pages.
>>>
>>> While this patch is not harmful I really do not see much point in adding
>>> the local draining here. Could you clarify please?
>>>
>> It targets the case described in the patch's commit message: oom_killer
>> thinks that it 'reclaims' pages while pages are still held by per-cpu
>> fbatches with a ref count.
>>
>> I admit that pages may sit on a different core(s). Given that
>> doing remote calls to all CPUs with lru_add_drain_all() is expensive,
>> this line of code can be helpful if it happens to give back a few pages
>> to the system right away without the overhead, especially when oom is
>> involved. Plus, it also makes the code consistent with other places
>> using mmu_gather feature to free pages in batch.
> 
> I would argue that consistency the biggest problem of this patch. It
> tries to follow a pattern that is just not really correct. First it
> operates on a random CPU from the oom victim perspective and second it
> doesn't really block any unmapping operation and that is the main
> purpose of the reaper. Sure it frees a lot of unmapped memory but if
> there are couple of pages that cannot be freed imeediately because they
> are sitting on a per-cpu LRU caches then this is not a deal breaker. As
> you have noted those pages might be sitting on any per-cpu cache.
> 
> So I do not really see that as a good justification. People will follow
> that pattern even more and spread lru_add_drain to other random places.
> 
> Unless you can show any actual runtime effect of this patch then I think
> it shouldn't be merged.
> 

Thanks for raising your concern.
I'd call it a trade-off rather than "not really correct". Look at
unmap_region() / free_pages_and_swap_cache() written by Linus. These are in
favor of this pattern, which indicates that the trade-off (i.e. draining
local CPU or draining all CPUs or no draining at all) had been made in the
same way in the past. I don't have a specific runtime effect to provide,
except that it will free 10s kB pages immediately during OOM.