On 01/09/2023 04:58, Matthew Wilcox wrote: > On Thu, Aug 31, 2023 at 03:21:53PM +0100, Ryan Roberts wrote: >> On 25/08/2023 14:59, Matthew Wilcox (Oracle) wrote: >>> By making release_pages() call folios_put(), we can get rid of the calls >>> to compound_head() for the callers that already know they have folios. >>> We can also get rid of the lock_batch tracking as we know the size of >>> the batch is limited by folio_batch. >>> - /* >>> - * Make sure the IRQ-safe lock-holding time does not get >>> - * excessive with a continuous string of pages from the >>> - * same lruvec. The lock is held only if lruvec != NULL. >>> - */ >>> - if (lruvec && ++lock_batch == SWAP_CLUSTER_MAX) { >> >> SWAP_CLUSTER_MAX is 32. By using the folio_batch, I think you are limitted to 15 >> in your batch, so I guess you could be taking/releasing the lock x2 as often? Is >> there any perf implication? > > Yes, if the batch size is larger than 15, we'll take/release the lru lock > more often. We could increase the size of the folio_batch if that becomes > a problem. I'm not sure how often it's a problem; we already limit the > number of folios to process to 15 in, eg, folio_batch_add_and_move(). > I'm not really sure why this code gets to be special and hold the lock > for twice as long as the callers of folio_batch_add_and_move. mmugather stores page pointers in batches of PAGE_SIZE (minus a small header). So for 4K pages on 64 bit system, that's ~512 page pointers per batch. So I would imagine you will notice 15 vs 32 for the munmap and mmap_exit cases. As promised, I'm planning to do some benchmarking today, so I could play with the size of the folio_batch. I'm not sure if there is any special reason PAGEVEC_SIZE is 15? I can see you changed it from 14 a while back, by better packing the struct pagevec header so that the structure is still the same size it used to be. But 14 was set in pre-history. Are things likely to break if we expand it to 31 (doubling its size)? > >>> @@ -1040,6 +1020,40 @@ void release_pages(release_pages_arg arg, int nr) >>> >>> mem_cgroup_uncharge_list(&pages_to_free); >>> free_unref_page_list(&pages_to_free); >>> + folios->nr = 0; >> >> folio_batch_reinit(folios) ? > > I don't really like the abstraction here. Back to folio_batch_move_lru() > as an example: > > for (i = 0; i < folio_batch_count(fbatch); i++) { > struct folio *folio = fbatch->folios[i]; > ... > } > ... > folio_batch_reinit(fbatch); > > vs what I'd rather write: > > for (i = 0; i < fbatch->nr; i++) { > struct folio *folio = fbatch->folios[i]; > ... > } > ... > fbatch->nr = 0; > > OK, we've successfully abstracted away that there is a member of > folio_batch called 'nr', but we still have to go poking around inside > folio_batch to extract the folio itself. So it's not like we've > managed to make folio_batch a completely opaque type. And I don't > think that folio_batch_count() is really all that much more descriptive > than fbatch->nr. Indeed, I think the second one is easier to read; > it's obviously a plain loop. > > I suppose that folio_batch_count() / _reinit() are easier to grep for > than '>nr\>' but I don't think that's a particularly useful thing to do. > We could add abstractions to get the folio_batch_folio(fbatch, i), but > when we start to get into something like folio_batch_remove_exceptionals() > (and there's something similar happening in this patchset where we strip > out the hugetlb folios), you're messing with the internal structure of > the folio_batch so much that you may as well not bother with any kind > of abstraction. > > I'm really temped to rip out folio_batch_count() and folio_batch_reinit(). > They don't seem useful enough. Looks like folio_batch_count() is used 71 times, and folio_batch_reinit() 0 times (after your changes). So the former will create churn to rip out. My opinion is simply that if there is an API or something, you should use it rather than touch the internals directly. It's just a nit from my perspective, so interpret as you like ;-) >