On 1/11/24 12:46 AM, Michal Hocko wrote: > On Wed 10-01-24 11:02:03, Jianfeng Wang wrote: >> On 1/10/24 12:46 AM, Michal Hocko wrote: >>> On Tue 09-01-24 01:15:11, Jianfeng Wang wrote: >>>> The oom_reaper tries to reclaim additional memory owned by the oom >>>> victim. In __oom_reap_task_mm(), it uses mmu_gather for batched page >>>> free. After oom_reaper was added, mmu_gather feature introduced >>>> CONFIG_MMU_GATHER_NO_GATHER (in 'commit 952a31c9e6fa ("asm-generic/tlb: >>>> Introduce CONFIG_HAVE_MMU_GATHER_NO_GATHER=y")', an option to skip batched >>>> page free. If set, tlb_batch_pages_flush(), which is responsible for >>>> calling lru_add_drain(), is skipped during tlb_finish_mmu(). Without it, >>>> pages could still be held by per-cpu fbatches rather than be freed. >>>> >>>> This fix adds lru_add_drain() prior to mmu_gather. This makes the code >>>> consistent with other cases where mmu_gather is used for freeing pages. >>> >>> Does this fix any actual problem or is this pure code consistency thing? >>> I am asking because it doesn't make much sense to me TBH, LRU cache >>> draining is usually important when we want to ensure that cached pages >>> are put to LRU to be dealt with because otherwise the MM code wouldn't >>> be able to deal with them. OOM reaper doesn't necessarily run on the >>> same CPU as the oom victim so draining on a local CPU doesn't >>> necessarily do anything for the victim's pages. >>> >>> While this patch is not harmful I really do not see much point in adding >>> the local draining here. Could you clarify please? >>> >> It targets the case described in the patch's commit message: oom_killer >> thinks that it 'reclaims' pages while pages are still held by per-cpu >> fbatches with a ref count. >> >> I admit that pages may sit on a different core(s). Given that >> doing remote calls to all CPUs with lru_add_drain_all() is expensive, >> this line of code can be helpful if it happens to give back a few pages >> to the system right away without the overhead, especially when oom is >> involved. Plus, it also makes the code consistent with other places >> using mmu_gather feature to free pages in batch. > > I would argue that consistency the biggest problem of this patch. It > tries to follow a pattern that is just not really correct. First it > operates on a random CPU from the oom victim perspective and second it > doesn't really block any unmapping operation and that is the main > purpose of the reaper. Sure it frees a lot of unmapped memory but if > there are couple of pages that cannot be freed imeediately because they > are sitting on a per-cpu LRU caches then this is not a deal breaker. As > you have noted those pages might be sitting on any per-cpu cache. > > So I do not really see that as a good justification. People will follow > that pattern even more and spread lru_add_drain to other random places. > > Unless you can show any actual runtime effect of this patch then I think > it shouldn't be merged. > Thanks for raising your concern. I'd call it a trade-off rather than "not really correct". Look at unmap_region() / free_pages_and_swap_cache() written by Linus. These are in favor of this pattern, which indicates that the trade-off (i.e. draining local CPU or draining all CPUs or no draining at all) had been made in the same way in the past. I don't have a specific runtime effect to provide, except that it will free 10s kB pages immediately during OOM.