On 12/14/23 9:57 AM, Tim Chen wrote: > On Wed, 2023-12-13 at 17:03 -0800, Jianfeng Wang wrote: >> On 12/13/23 2:57 PM, Tim Chen wrote: >>> On Tue, 2023-12-12 at 23:28 -0800, Jianfeng Wang wrote: >>>> When unmapping VMA pages, pages will be gathered in batch and released by >>>> tlb_finish_mmu() if CONFIG_MMU_GATHER_NO_GATHER is not set. The function >>>> tlb_finish_mmu() is responsible for calling free_pages_and_swap_cache(), >>>> which calls lru_add_drain() to drain cached pages in folio_batch before >>>> releasing gathered pages. Thus, it is redundant to call lru_add_drain() >>>> before gathering pages, if CONFIG_MMU_GATHER_NO_GATHER is not set. >>>> >>>> Remove lru_add_drain() prior to gathering and unmapping pages in >>>> exit_mmap() and unmap_region() if CONFIG_MMU_GATHER_NO_GATHER is not set. >>>> >>>> Note that the page unmapping process in oom_killer (e.g., in >>>> __oom_reap_task_mm()) also uses tlb_finish_mmu() and does not have >>>> redundant lru_add_drain(). So, this commit makes the code more consistent. >>>> >>>> Signed-off-by: Jianfeng Wang <jianfeng.w.wang@xxxxxxxxxx> >>>> --- >>>> mm/mmap.c | 4 ++++ >>>> 1 file changed, 4 insertions(+) >>>> >>>> diff --git a/mm/mmap.c b/mm/mmap.c >>>> index 1971bfffcc03..0451285dee4f 100644 >>>> --- a/mm/mmap.c >>>> +++ b/mm/mmap.c >>>> @@ -2330,7 +2330,9 @@ static void unmap_region(struct mm_struct *mm, struct ma_state *mas, >>>> struct mmu_gather tlb; >>>> unsigned long mt_start = mas->index; >>>> >>>> +#ifdef CONFIG_MMU_GATHER_NO_GATHER >>> >>> In your comment you say skip lru_add_drain() when CONFIG_MMU_GATHER_NO_GATHER >>> is *not* set. So shouldn't this be >>> >>> #ifndef CONFIG_MMU_GATHER_NO_GATHER ? >>> >> Hi Tim, >> >> The mmu_gather feature is used to gather pages produced by unmap_vmas() and >> release them in batch in tlb_finish_mmu(). The feature is *on* if >> CONFIG_MMU_GATHER_NO_GATHER is *not* set. Note that: tlb_finish_mmu() will call >> free_pages_and_swap_cache()/lru_add_drain() only when the feature is on. > > Thanks for the explanation. > > Looking at the code, lru_add_drain() is executed for #ifndef CONFIG_MMU_GATHER_NO_GATHER > in tlb_finish_mmu(). So the logic of your patch is fine. > > The #ifndef CONFIG_MMU_GATHER_NO_GATHER means > mmu_gather feature is on. The double negative throws me off on in my first read > of your commit log. > > Suggest that you add a comment in code to make it easier for > future code maintenence: > > /* defer lru_add_drain() to tlb_finish_mmu() for ifndef CONFIG_MMU_GATHER_NO_GATHER */ > > Is your change of skipping the extra lru_add_drain() motivated by some performance reason > in a workload? Wonder whether it is worth adding an extra ifdef in the code. > > Tim > Okay, great suggestion. We observe heavy contention on the LRU lock, introduced by lru_add_drain() and release_pages() for a prod workload, and we're trying to reduce the level of contention. lru_add_drain() is a complex function that first takes a local CPU lock and iterate through *all* folio_batches to see if there are pages to be moved to and between LRU lists. At that point, any page in these folio_batches will trigger acquiring the per-LRU spin lock and increase the level of lock contention. Applying the change can avoid calling lru_add_drain() unnecessarily, which is a source of lock contention. Together with the comment line suggested by you, I believe this also increases code readability to clarify the mmu_gather feature. - Jianfeng >> >> Yes, this commit aims to skip lru_add_drain() when CONFIG_MMU_GATHER_NO_GATHER >> is *not* set (i.e. when the mmu_gather feature is on) because it is redundant. >> >> If CONFIG_MMU_GATHER_NO_GATHER is set, pages will be released in unmap_vmas(). >> tlb_finish_mmu() will not call lru_add_drain(). So, it is still necessary to >> keep the lru_add_drain() call to clear cached pages before unmap_vmas(), as >> folio_batchs hold a reference count for pages in them. >> >> The same applies to the other case. >> >> Thanks, >> - Jianfeng >> >>>> lru_add_drain(); >>>> +#endif >>>> tlb_gather_mmu(&tlb, mm); >>>> update_hiwater_rss(mm); >>>> unmap_vmas(&tlb, mas, vma, start, end, tree_end, mm_wr_locked); >>>> @@ -3300,7 +3302,9 @@ void exit_mmap(struct mm_struct *mm) >>>> return; >>>> } >>>> >>>> +#ifdef CONFIG_MMU_GATHER_NO_GATHER >>> >>> same question as above. >>> >>>> lru_add_drain(); >>>> +#endif >>>> flush_cache_mm(mm); >>>> tlb_gather_mmu_fullmm(&tlb, mm); >>>> /* update_hiwater_rss(mm) here? but nobody should be looking */ >>> >