> The thought occurs that we don't need to take the folios off the list. > I don't know that will fix anything, but this will fix your "running out > of memory" problem -- I forgot to drop the reference if folio_trylock() > failed. Of course, I can't call folio_put() inside the lock, so may > as well move the trylock back to the second loop. > > Again, compile-tessted only. > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index fd745bcc97ff..4a2ab17f802d 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -3312,7 +3312,7 @@ static unsigned long deferred_split_scan(struct shrinker *shrink, > struct pglist_data *pgdata = NODE_DATA(sc->nid); > struct deferred_split *ds_queue = &pgdata->deferred_split_queue; > unsigned long flags; > - LIST_HEAD(list); > + struct folio_batch batch; > struct folio *folio, *next; > int split = 0; > > @@ -3321,36 +3321,31 @@ static unsigned long deferred_split_scan(struct shrinker *shrink, > ds_queue = &sc->memcg->deferred_split_queue; > #endif > > + folio_batch_init(&batch); > spin_lock_irqsave(&ds_queue->split_queue_lock, flags); > - /* Take pin on all head pages to avoid freeing them under us */ > + /* Take ref on all folios to avoid freeing them under us */ > list_for_each_entry_safe(folio, next, &ds_queue->split_queue, > _deferred_list) { > - if (folio_try_get(folio)) { > - list_move(&folio->_deferred_list, &list); > - } else { > - /* We lost race with folio_put() */ > - list_del_init(&folio->_deferred_list); > - ds_queue->split_queue_len--; > + if (!folio_try_get(folio)) > + continue; > + if (folio_batch_add(&batch, folio) == 0) { > + --sc->nr_to_scan; > + break; > } > if (!--sc->nr_to_scan) > break; > } > spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags); > > - list_for_each_entry_safe(folio, next, &list, _deferred_list) { > + while ((folio = folio_batch_next(&batch)) != NULL) { > if (!folio_trylock(folio)) > - goto next; > - /* split_huge_page() removes page from list on success */ > + continue; > if (!split_folio(folio)) > split++; > folio_unlock(folio); > -next: > - folio_put(folio); > } > > - spin_lock_irqsave(&ds_queue->split_queue_lock, flags); > - list_splice_tail(&list, &ds_queue->split_queue); > - spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags); > + folios_put(&batch); > > /* > * Stop shrinker if we didn't split any page, but the queue is empty. OK I've tested this; the good news is that I haven't seen any oopses or memory leaks. The bad news is that it still takes an absolute age (hours) to complete the same test that without "mm: Allow non-hugetlb large folios to be batch processed" took a couple of mins. And during that time, the system is completely unresponsive - serial terminal doesn't work - can't even break in with sysreq. And sometimes I see RCU stall warnings. Dumping all the CPU back traces with gdb, all the cores (except one) are contending on the the deferred split lock. A couple of thoughts: - Since we are now taking a maximum of 15 folios into a batch, deferred_split_scan() is called much more often (in a tight loop from do_shrink_slab()). Could it be that we are just trying to take the lock so much more often now? I don't think it's quite that simple because we take the lock for every single folio when adding it to the queue, so the dequeing cost should still be a factor of 15 locks less. - do_shrink_slab() might be calling deferred_split_scan() in a tight loop with deferred_split_scan() returning 0 most of the time. If there are still folios on the deferred split list but deferred_split_scan() was unable to lock any folios then it will return 0, not SHRINK_STOP, so do_shrink_slab() will keep calling it, essentially live locking. Has your patch changed the duration of the folio being locked? I don't think so... - Ahh, perhaps its as simple as your fix has removed the code that removed the folio from the deferred split queue if it fails to get a reference? That could mean we end up returning 0 instead of SHRINK_STOP too. I'll have play.