On 25/04/2023 09:46, Yin Fengwei wrote: > free_transhuge_page() acquires split queue lock then check > whether the THP was added to deferred list or not. > > It's safe to check whether the THP is in deferred list or not. > When code hit free_transhuge_page(), there is no one tries > to update the folio's _deferred_list. > > If folio is not in deferred_list, it's safe to check without > acquiring lock. > > If folio is in deferred_list, the other node in deferred_list > adding/deleteing doesn't impact the return value of > list_epmty(@folio->_deferred_list). > > Running page_fault1 of will-it-scale + order 2 folio for anonymous > mapping with 96 processes on an Ice Lake 48C/96T test box, we could > see the 61% split_queue_lock contention: > - 71.28% 0.35% page_fault1_pro [kernel.kallsyms] [k] > release_pages > - 70.93% release_pages > - 61.42% free_transhuge_page > + 60.77% _raw_spin_lock_irqsave > > With this patch applied, the split_queue_lock contention is less > than 1%. > > Signed-off-by: Yin Fengwei <fengwei.yin@xxxxxxxxx> > Tested-by: Ryan Roberts <ryan.roberts@xxxxxxx> > --- > mm/huge_memory.c | 19 ++++++++++++++++--- > 1 file changed, 16 insertions(+), 3 deletions(-) > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index 032fb0ef9cd1..c620f1f12247 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -2799,12 +2799,25 @@ void free_transhuge_page(struct page *page) > struct deferred_split *ds_queue = get_deferred_split_queue(folio); > unsigned long flags; > > - spin_lock_irqsave(&ds_queue->split_queue_lock, flags); > - if (!list_empty(&folio->_deferred_list)) { > + /* > + * At this point, there is no one trying to queue the folio > + * to deferred_list. folio->_deferred_list is not possible > + * being updated. > + * > + * If folio is already added to deferred_list, add/delete to/from > + * deferred_list will not impact list_empty(&folio->_deferred_list). > + * It's safe to check list_empty(&folio->_deferred_list) without > + * acquiring the lock. > + * > + * If folio is not in deferred_list, it's safe to check without > + * acquiring the lock. > + */ > + if (data_race(!list_empty(&folio->_deferred_list))) { > + spin_lock_irqsave(&ds_queue->split_queue_lock, flags); > ds_queue->split_queue_len--; > list_del(&folio->_deferred_list); I wonder if there is a race here? Could the folio have been in the deferred list when checking, but then something removed it from the list before the lock is taken? In this case, I guess split_queue_len would be out of sync with the number of folios in the queue? Perhaps recheck list_empty() after taking the lock? Thanks, Ryan > + spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags); > } > - spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags); > free_compound_page(page); > } >