On 4/26/23 09:13, Huang, Ying wrote: > Yin Fengwei <fengwei.yin@xxxxxxxxx> writes: > >> free_transhuge_page() acquires split queue lock then check >> whether the THP was added to deferred list or not. >> >> It's safe to check whether the THP is in deferred list or not. >> When code hit free_transhuge_page(), there is no one tries >> to update the folio's _deferred_list. > > I think that it's clearer to enumerate all places pages are added and > removed from deferred list. Then we can find out whether there's code > path that may race with this. > > Take a glance at the search result of `grep split_queue_lock -r mm`. It > seems that deferred_split_scan() may race with free_transhuge_page(), so > we need to recheck with the lock held as Kirill pointed out. My understanding is the check after take the lock is not necessary. See my reply to Kirill. Thanks. Regards Yin, Fengwei > > Best Regards, > Huang, Ying > >> If folio is not in deferred_list, it's safe to check without >> acquiring lock. >> >> If folio is in deferred_list, the other node in deferred_list >> adding/deleteing doesn't impact the return value of >> list_epmty(@folio->_deferred_list). >> >> Running page_fault1 of will-it-scale + order 2 folio for anonymous >> mapping with 96 processes on an Ice Lake 48C/96T test box, we could >> see the 61% split_queue_lock contention: >> - 71.28% 0.35% page_fault1_pro [kernel.kallsyms] [k] >> release_pages >> - 70.93% release_pages >> - 61.42% free_transhuge_page >> + 60.77% _raw_spin_lock_irqsave >> >> With this patch applied, the split_queue_lock contention is less >> than 1%. >> >> Signed-off-by: Yin Fengwei <fengwei.yin@xxxxxxxxx> >> Tested-by: Ryan Roberts <ryan.roberts@xxxxxxx> >> --- >> mm/huge_memory.c | 19 ++++++++++++++++--- >> 1 file changed, 16 insertions(+), 3 deletions(-) >> >> diff --git a/mm/huge_memory.c b/mm/huge_memory.c >> index 032fb0ef9cd1..c620f1f12247 100644 >> --- a/mm/huge_memory.c >> +++ b/mm/huge_memory.c >> @@ -2799,12 +2799,25 @@ void free_transhuge_page(struct page *page) >> struct deferred_split *ds_queue = get_deferred_split_queue(folio); >> unsigned long flags; >> >> - spin_lock_irqsave(&ds_queue->split_queue_lock, flags); >> - if (!list_empty(&folio->_deferred_list)) { >> + /* >> + * At this point, there is no one trying to queue the folio >> + * to deferred_list. folio->_deferred_list is not possible >> + * being updated. >> + * >> + * If folio is already added to deferred_list, add/delete to/from >> + * deferred_list will not impact list_empty(&folio->_deferred_list). >> + * It's safe to check list_empty(&folio->_deferred_list) without >> + * acquiring the lock. >> + * >> + * If folio is not in deferred_list, it's safe to check without >> + * acquiring the lock. >> + */ >> + if (data_race(!list_empty(&folio->_deferred_list))) { >> + spin_lock_irqsave(&ds_queue->split_queue_lock, flags); >> ds_queue->split_queue_len--; >> list_del(&folio->_deferred_list); >> + spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags); >> } >> - spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags); >> free_compound_page(page); >> }