On 1/10/24 12:46 AM, Michal Hocko wrote: > On Tue 09-01-24 01:15:11, Jianfeng Wang wrote: >> The oom_reaper tries to reclaim additional memory owned by the oom >> victim. In __oom_reap_task_mm(), it uses mmu_gather for batched page >> free. After oom_reaper was added, mmu_gather feature introduced >> CONFIG_MMU_GATHER_NO_GATHER (in 'commit 952a31c9e6fa ("asm-generic/tlb: >> Introduce CONFIG_HAVE_MMU_GATHER_NO_GATHER=y")', an option to skip batched >> page free. If set, tlb_batch_pages_flush(), which is responsible for >> calling lru_add_drain(), is skipped during tlb_finish_mmu(). Without it, >> pages could still be held by per-cpu fbatches rather than be freed. >> >> This fix adds lru_add_drain() prior to mmu_gather. This makes the code >> consistent with other cases where mmu_gather is used for freeing pages. > > Does this fix any actual problem or is this pure code consistency thing? > I am asking because it doesn't make much sense to me TBH, LRU cache > draining is usually important when we want to ensure that cached pages > are put to LRU to be dealt with because otherwise the MM code wouldn't > be able to deal with them. OOM reaper doesn't necessarily run on the > same CPU as the oom victim so draining on a local CPU doesn't > necessarily do anything for the victim's pages. > > While this patch is not harmful I really do not see much point in adding > the local draining here. Could you clarify please? > It targets the case described in the patch's commit message: oom_killer thinks that it 'reclaims' pages while pages are still held by per-cpu fbatches with a ref count. I admit that pages may sit on a different core(s). Given that doing remote calls to all CPUs with lru_add_drain_all() is expensive, this line of code can be helpful if it happens to give back a few pages to the system right away without the overhead, especially when oom is involved. Plus, it also makes the code consistent with other places using mmu_gather feature to free pages in batch. --JW >> Signed-off-by: Jianfeng Wang <jianfeng.w.wang@xxxxxxxxxx> >> --- >> mm/oom_kill.c | 1 + >> 1 file changed, 1 insertion(+) >> >> diff --git a/mm/oom_kill.c b/mm/oom_kill.c >> index 9e6071fde34a..e2fcf4f062ea 100644 >> --- a/mm/oom_kill.c >> +++ b/mm/oom_kill.c >> @@ -537,6 +537,7 @@ static bool __oom_reap_task_mm(struct mm_struct *mm) >> struct mmu_notifier_range range; >> struct mmu_gather tlb; >> >> + lru_add_drain(); >> mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0, >> mm, vma->vm_start, >> vma->vm_end); >> -- >> 2.42.1 >> >