On Fri, Mar 08, 2024 at 12:09:38PM +0000, Ryan Roberts wrote: > On 08/03/2024 11:44, Ryan Roberts wrote: > > Dumping all the CPU back traces with gdb, all the cores (except one) are > > contending on the the deferred split lock. > > > > A couple of thoughts: > > > > - Since we are now taking a maximum of 15 folios into a batch, > > deferred_split_scan() is called much more often (in a tight loop from > > do_shrink_slab()). Could it be that we are just trying to take the lock so much > > more often now? I don't think it's quite that simple because we take the lock > > for every single folio when adding it to the queue, so the dequeing cost should > > still be a factor of 15 locks less. > > > > - do_shrink_slab() might be calling deferred_split_scan() in a tight loop with > > deferred_split_scan() returning 0 most of the time. If there are still folios on > > the deferred split list but deferred_split_scan() was unable to lock any folios > > then it will return 0, not SHRINK_STOP, so do_shrink_slab() will keep calling > > it, essentially live locking. Has your patch changed the duration of the folio > > being locked? I don't think so... > > > > - Ahh, perhaps its as simple as your fix has removed the code that removed the > > folio from the deferred split queue if it fails to get a reference? That could > > mean we end up returning 0 instead of SHRINK_STOP too. I'll have play. > > > > I tested the last idea by adding this back in: > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index d46897d7ea7f..50b07362923a 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -3327,8 +3327,12 @@ static unsigned long deferred_split_scan(struct shrinker > *shrink, > /* Take ref on all folios to avoid freeing them under us */ > list_for_each_entry_safe(folio, next, &ds_queue->split_queue, > _deferred_list) { > - if (!folio_try_get(folio)) > + if (!folio_try_get(folio)) { > + /* We lost race with folio_put() */ > + list_del_init(&folio->_deferred_list); > + ds_queue->split_queue_len--; > continue; > + } > if (folio_batch_add(&batch, folio) == 0) { > --sc->nr_to_scan; > break; > > The test now gets further than where it was previously getting live-locked, but If the deferred_split_lock contention comes back, we can fix split_huge_page_to_list() to only take the lock if the page is on the list. Right now, it takes it unconditionally which could be avoided. I'm not going to send a patch just yet to avoid confusion.