On 26/04/2023 03:08, Yin Fengwei wrote: > > > On 4/25/23 20:38, Kirill A. Shutemov wrote: >> On Tue, Apr 25, 2023 at 04:46:26PM +0800, Yin Fengwei wrote: >>> free_transhuge_page() acquires split queue lock then check >>> whether the THP was added to deferred list or not. >>> >>> It's safe to check whether the THP is in deferred list or not. >>> When code hit free_transhuge_page(), there is no one tries >>> to update the folio's _deferred_list. >>> >>> If folio is not in deferred_list, it's safe to check without >>> acquiring lock. >>> >>> If folio is in deferred_list, the other node in deferred_list >>> adding/deleteing doesn't impact the return value of >>> list_epmty(@folio->_deferred_list). >> >> Typo. >> >>> >>> Running page_fault1 of will-it-scale + order 2 folio for anonymous >>> mapping with 96 processes on an Ice Lake 48C/96T test box, we could >>> see the 61% split_queue_lock contention: >>> - 71.28% 0.35% page_fault1_pro [kernel.kallsyms] [k] >>> release_pages >>> - 70.93% release_pages >>> - 61.42% free_transhuge_page >>> + 60.77% _raw_spin_lock_irqsave >>> >>> With this patch applied, the split_queue_lock contention is less >>> than 1%. >>> >>> Signed-off-by: Yin Fengwei <fengwei.yin@xxxxxxxxx> >>> Tested-by: Ryan Roberts <ryan.roberts@xxxxxxx> >>> --- >>> mm/huge_memory.c | 19 ++++++++++++++++--- >>> 1 file changed, 16 insertions(+), 3 deletions(-) >>> >>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c >>> index 032fb0ef9cd1..c620f1f12247 100644 >>> --- a/mm/huge_memory.c >>> +++ b/mm/huge_memory.c >>> @@ -2799,12 +2799,25 @@ void free_transhuge_page(struct page *page) >>> struct deferred_split *ds_queue = get_deferred_split_queue(folio); >>> unsigned long flags; >>> >>> - spin_lock_irqsave(&ds_queue->split_queue_lock, flags); >>> - if (!list_empty(&folio->_deferred_list)) { >>> + /* >>> + * At this point, there is no one trying to queue the folio >>> + * to deferred_list. folio->_deferred_list is not possible >>> + * being updated. >>> + * >>> + * If folio is already added to deferred_list, add/delete to/from >>> + * deferred_list will not impact list_empty(&folio->_deferred_list). >>> + * It's safe to check list_empty(&folio->_deferred_list) without >>> + * acquiring the lock. >>> + * >>> + * If folio is not in deferred_list, it's safe to check without >>> + * acquiring the lock. >>> + */ >>> + if (data_race(!list_empty(&folio->_deferred_list))) { >>> + spin_lock_irqsave(&ds_queue->split_queue_lock, flags); >> >> Recheck under lock? > Huang Ying pointed out the race with deferred_split_scan(). And Yes. Need > recheck under lock. Will update in next version. Oops sorry - I see this was already pointed out. Disregard my previous mail. Thanks, Ryan > > > Regards > Yin, Fengwei > >> >>> ds_queue->split_queue_len--; >>> list_del(&folio->_deferred_list); >>> + spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags); >>> } >>> - spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags); >>> free_compound_page(page); >>> } >>> >>> -- >>> 2.30.2 >>> >>> >>