Hi Kirill, On 4/28/2023 10:02 PM, Kirill A. Shutemov wrote: > On Fri, Apr 28, 2023 at 02:28:07PM +0800, Yin, Fengwei wrote: >> Hi Kirill, >> >> On 4/25/2023 8:38 PM, Kirill A. Shutemov wrote: >>> On Tue, Apr 25, 2023 at 04:46:26PM +0800, Yin Fengwei wrote: >>>> free_transhuge_page() acquires split queue lock then check >>>> whether the THP was added to deferred list or not. >>>> >>>> It's safe to check whether the THP is in deferred list or not. >>>> When code hit free_transhuge_page(), there is no one tries >>>> to update the folio's _deferred_list. >>>> >>>> If folio is not in deferred_list, it's safe to check without >>>> acquiring lock. >>>> >>>> If folio is in deferred_list, the other node in deferred_list >>>> adding/deleteing doesn't impact the return value of >>>> list_epmty(@folio->_deferred_list). >>> >>> Typo. >>> >>>> >>>> Running page_fault1 of will-it-scale + order 2 folio for anonymous >>>> mapping with 96 processes on an Ice Lake 48C/96T test box, we could >>>> see the 61% split_queue_lock contention: >>>> - 71.28% 0.35% page_fault1_pro [kernel.kallsyms] [k] >>>> release_pages >>>> - 70.93% release_pages >>>> - 61.42% free_transhuge_page >>>> + 60.77% _raw_spin_lock_irqsave >>>> >>>> With this patch applied, the split_queue_lock contention is less >>>> than 1%. >>>> >>>> Signed-off-by: Yin Fengwei <fengwei.yin@xxxxxxxxx> >>>> Tested-by: Ryan Roberts <ryan.roberts@xxxxxxx> >>>> --- >>>> mm/huge_memory.c | 19 ++++++++++++++++--- >>>> 1 file changed, 16 insertions(+), 3 deletions(-) >>>> >>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c >>>> index 032fb0ef9cd1..c620f1f12247 100644 >>>> --- a/mm/huge_memory.c >>>> +++ b/mm/huge_memory.c >>>> @@ -2799,12 +2799,25 @@ void free_transhuge_page(struct page *page) >>>> struct deferred_split *ds_queue = get_deferred_split_queue(folio); >>>> unsigned long flags; >>>> >>>> - spin_lock_irqsave(&ds_queue->split_queue_lock, flags); >>>> - if (!list_empty(&folio->_deferred_list)) { >>>> + /* >>>> + * At this point, there is no one trying to queue the folio >>>> + * to deferred_list. folio->_deferred_list is not possible >>>> + * being updated. >>>> + * >>>> + * If folio is already added to deferred_list, add/delete to/from >>>> + * deferred_list will not impact list_empty(&folio->_deferred_list). >>>> + * It's safe to check list_empty(&folio->_deferred_list) without >>>> + * acquiring the lock. >>>> + * >>>> + * If folio is not in deferred_list, it's safe to check without >>>> + * acquiring the lock. >>>> + */ >>>> + if (data_race(!list_empty(&folio->_deferred_list))) { >>>> + spin_lock_irqsave(&ds_queue->split_queue_lock, flags); >>> >>> Recheck under lock? >> In function deferred_split_scan(), there is following code block: >> if (folio_try_get(folio)) { >> list_move(&folio->_deferred_list, &list); >> } else { >> /* We lost race with folio_put() */ >> list_del_init(&folio->_deferred_list); >> ds_queue->split_queue_len--; >> } >> >> I am wondering what kind of "lost race with folio_put()" can be. >> >> My understanding is that it's not necessary to handle this case here >> because free_transhuge_page() will handle it once folio get zero ref. >> But I must miss something here. Thanks. > > free_transhuge_page() got when refcount is already zero. Both > deferred_split_scan() and free_transhuge_page() can see the page with zero > refcount. The check makes deferred_split_scan() to leave the page to the > free_transhuge_page(). > If deferred_split_scan() leaves the page to free_transhuge_page(), is it necessary to do list_del_init(&folio->_deferred_list); ds_queue->split_queue_len--; Can these two line be left to free_transhuge_page() either? Thanks. Regards Yin, Fengwei