I've had a few conversations recently about how many objects should be in a batch in some disparate contextx, so I thought I'd write down my opinion so I can refer to it in future. TLDR: Start your batch size around 10, adjust the batch size when measurements tell you to change it. In this model, let's look at the cost of allocating N objects from an allocator. Assume there's a fixed cost, say 4 (units are not relevant here) for going into the allocator and then there's a 1 unit cost per object (eg we're taking a spinlock, pulling N objects out of the data structure and releasing the spinlock again). Allocating 100 * 1 objects would cost 500 units. Our best case is that we could save 396 units by allocating a batch of 100. But we probably don't know how many objects we're going to need to allocate, so we pull objects from the allocator in smaller batches. Here's a table showing the costs for different batch sizes: Batch size Cost of allocating 100 thousand million 1 500 (5 * 100) 5000 5M 2 300 (6 * 50) 3000 3M 4 200 (8 * 25) 2000 2M 8 156 (12 * 13) 1500 1.5M 16 140 (20 * 7) 1260 1.25M 32 144 (36 * 4) 1152 1.13M 64 136 (68 * 2) 1088 1.06M 128 132 (132 * 1) 1056 1.03M You can see the knee of this curve is around 8. It fluctuates a bit after that depending on how many "left over" objects we have after allocating the 100 it turned out that we needed. Even if we think we're going to be dealing with a _lot_ of objects (the thousand and million column), we've got most of the advantage by the time we get to 8 (eg a reduction of 3.5M from a total possible reduction of 4M), and while I wouldn't sneeze at getting a few more percentage points of overhead reduction, we're scrabbling at the margins now, not getting big wins. This is a simple model for only one situation. If we have a locking contention breakdown, the overhead cost might be much higher than 4 units, and that would lead us to a larger batch size. Another consideration is how much of each object we have to touch. put_pages_list() is frequently called with batches of 500 pages. In order to free a folio, we have to manipulate its contents, so touching at least one cacheline per object. And we make multiple passes over the batch, first decrementing the refcount, removing it from the lru list; second uncharging the folios from the memcg (writes to folio->memcg_data); third calling free_pages_prepare which, eg, sets ->mapping to NULL; fourth putting the folio on the pcp list (writing to the list_head). With 500 folios on the list, that uses 500 * 64 bytes of cache which just barely fits into the L1 cache of a modern laptop CPU (never mind whatever else we might want to have in the L1). Capping the batch size at 15 (as my recent patches do) uses only 1kB of L1, which is a much more reasonable amount of cache to take up. We can be pretty sure the first one is still in it when the last one has finished being processed.