On Sat, Apr 29, 2023 at 04:32:34PM +0800, Yin, Fengwei wrote: > Hi Kirill, > > On 4/28/2023 10:02 PM, Kirill A. Shutemov wrote: > > On Fri, Apr 28, 2023 at 02:28:07PM +0800, Yin, Fengwei wrote: > >> Hi Kirill, > >> > >> On 4/25/2023 8:38 PM, Kirill A. Shutemov wrote: > >>> On Tue, Apr 25, 2023 at 04:46:26PM +0800, Yin Fengwei wrote: > >>>> free_transhuge_page() acquires split queue lock then check > >>>> whether the THP was added to deferred list or not. > >>>> > >>>> It's safe to check whether the THP is in deferred list or not. > >>>> When code hit free_transhuge_page(), there is no one tries > >>>> to update the folio's _deferred_list. > >>>> > >>>> If folio is not in deferred_list, it's safe to check without > >>>> acquiring lock. > >>>> > >>>> If folio is in deferred_list, the other node in deferred_list > >>>> adding/deleteing doesn't impact the return value of > >>>> list_epmty(@folio->_deferred_list). > >>> > >>> Typo. > >>> > >>>> > >>>> Running page_fault1 of will-it-scale + order 2 folio for anonymous > >>>> mapping with 96 processes on an Ice Lake 48C/96T test box, we could > >>>> see the 61% split_queue_lock contention: > >>>> - 71.28% 0.35% page_fault1_pro [kernel.kallsyms] [k] > >>>> release_pages > >>>> - 70.93% release_pages > >>>> - 61.42% free_transhuge_page > >>>> + 60.77% _raw_spin_lock_irqsave > >>>> > >>>> With this patch applied, the split_queue_lock contention is less > >>>> than 1%. > >>>> > >>>> Signed-off-by: Yin Fengwei <fengwei.yin@xxxxxxxxx> > >>>> Tested-by: Ryan Roberts <ryan.roberts@xxxxxxx> > >>>> --- > >>>> mm/huge_memory.c | 19 ++++++++++++++++--- > >>>> 1 file changed, 16 insertions(+), 3 deletions(-) > >>>> > >>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c > >>>> index 032fb0ef9cd1..c620f1f12247 100644 > >>>> --- a/mm/huge_memory.c > >>>> +++ b/mm/huge_memory.c > >>>> @@ -2799,12 +2799,25 @@ void free_transhuge_page(struct page *page) > >>>> struct deferred_split *ds_queue = get_deferred_split_queue(folio); > >>>> unsigned long flags; > >>>> > >>>> - spin_lock_irqsave(&ds_queue->split_queue_lock, flags); > >>>> - if (!list_empty(&folio->_deferred_list)) { > >>>> + /* > >>>> + * At this point, there is no one trying to queue the folio > >>>> + * to deferred_list. folio->_deferred_list is not possible > >>>> + * being updated. > >>>> + * > >>>> + * If folio is already added to deferred_list, add/delete to/from > >>>> + * deferred_list will not impact list_empty(&folio->_deferred_list). > >>>> + * It's safe to check list_empty(&folio->_deferred_list) without > >>>> + * acquiring the lock. > >>>> + * > >>>> + * If folio is not in deferred_list, it's safe to check without > >>>> + * acquiring the lock. > >>>> + */ > >>>> + if (data_race(!list_empty(&folio->_deferred_list))) { > >>>> + spin_lock_irqsave(&ds_queue->split_queue_lock, flags); > >>> > >>> Recheck under lock? > >> In function deferred_split_scan(), there is following code block: > >> if (folio_try_get(folio)) { > >> list_move(&folio->_deferred_list, &list); > >> } else { > >> /* We lost race with folio_put() */ > >> list_del_init(&folio->_deferred_list); > >> ds_queue->split_queue_len--; > >> } > >> > >> I am wondering what kind of "lost race with folio_put()" can be. > >> > >> My understanding is that it's not necessary to handle this case here > >> because free_transhuge_page() will handle it once folio get zero ref. > >> But I must miss something here. Thanks. > > > > free_transhuge_page() got when refcount is already zero. Both > > deferred_split_scan() and free_transhuge_page() can see the page with zero > > refcount. The check makes deferred_split_scan() to leave the page to the > > free_transhuge_page(). > > > If deferred_split_scan() leaves the page to free_transhuge_page(), is it > necessary to do > list_del_init(&folio->_deferred_list); > ds_queue->split_queue_len--; > > Can these two line be left to free_transhuge_page() either? Thanks. I *think* (my cache is cold on deferred split) we can. But since we already hold the lock, why not take care of it? It makes your change more efficient. -- Kiryl Shutsemau / Kirill A. Shutemov