On Fri, Apr 28, 2023 at 02:28:07PM +0800, Yin, Fengwei wrote: > Hi Kirill, > > On 4/25/2023 8:38 PM, Kirill A. Shutemov wrote: > > On Tue, Apr 25, 2023 at 04:46:26PM +0800, Yin Fengwei wrote: > >> free_transhuge_page() acquires split queue lock then check > >> whether the THP was added to deferred list or not. > >> > >> It's safe to check whether the THP is in deferred list or not. > >> When code hit free_transhuge_page(), there is no one tries > >> to update the folio's _deferred_list. > >> > >> If folio is not in deferred_list, it's safe to check without > >> acquiring lock. > >> > >> If folio is in deferred_list, the other node in deferred_list > >> adding/deleteing doesn't impact the return value of > >> list_epmty(@folio->_deferred_list). > > > > Typo. > > > >> > >> Running page_fault1 of will-it-scale + order 2 folio for anonymous > >> mapping with 96 processes on an Ice Lake 48C/96T test box, we could > >> see the 61% split_queue_lock contention: > >> - 71.28% 0.35% page_fault1_pro [kernel.kallsyms] [k] > >> release_pages > >> - 70.93% release_pages > >> - 61.42% free_transhuge_page > >> + 60.77% _raw_spin_lock_irqsave > >> > >> With this patch applied, the split_queue_lock contention is less > >> than 1%. > >> > >> Signed-off-by: Yin Fengwei <fengwei.yin@xxxxxxxxx> > >> Tested-by: Ryan Roberts <ryan.roberts@xxxxxxx> > >> --- > >> mm/huge_memory.c | 19 ++++++++++++++++--- > >> 1 file changed, 16 insertions(+), 3 deletions(-) > >> > >> diff --git a/mm/huge_memory.c b/mm/huge_memory.c > >> index 032fb0ef9cd1..c620f1f12247 100644 > >> --- a/mm/huge_memory.c > >> +++ b/mm/huge_memory.c > >> @@ -2799,12 +2799,25 @@ void free_transhuge_page(struct page *page) > >> struct deferred_split *ds_queue = get_deferred_split_queue(folio); > >> unsigned long flags; > >> > >> - spin_lock_irqsave(&ds_queue->split_queue_lock, flags); > >> - if (!list_empty(&folio->_deferred_list)) { > >> + /* > >> + * At this point, there is no one trying to queue the folio > >> + * to deferred_list. folio->_deferred_list is not possible > >> + * being updated. > >> + * > >> + * If folio is already added to deferred_list, add/delete to/from > >> + * deferred_list will not impact list_empty(&folio->_deferred_list). > >> + * It's safe to check list_empty(&folio->_deferred_list) without > >> + * acquiring the lock. > >> + * > >> + * If folio is not in deferred_list, it's safe to check without > >> + * acquiring the lock. > >> + */ > >> + if (data_race(!list_empty(&folio->_deferred_list))) { > >> + spin_lock_irqsave(&ds_queue->split_queue_lock, flags); > > > > Recheck under lock? > In function deferred_split_scan(), there is following code block: > if (folio_try_get(folio)) { > list_move(&folio->_deferred_list, &list); > } else { > /* We lost race with folio_put() */ > list_del_init(&folio->_deferred_list); > ds_queue->split_queue_len--; > } > > I am wondering what kind of "lost race with folio_put()" can be. > > My understanding is that it's not necessary to handle this case here > because free_transhuge_page() will handle it once folio get zero ref. > But I must miss something here. Thanks. free_transhuge_page() got when refcount is already zero. Both deferred_split_scan() and free_transhuge_page() can see the page with zero refcount. The check makes deferred_split_scan() to leave the page to the free_transhuge_page(). -- Kiryl Shutsemau / Kirill A. Shutemov