On Mon, Nov 21, 2022 at 04:59:38PM +0000, Shakeel Butt wrote: > On Fri, Nov 18, 2022 at 01:08:13AM -0800, Hugh Dickins wrote: > > Linus was underwhelmed by the earlier compound mapcounts series: > > this series builds on top of it (as in next-20221117) to follow > > up on his suggestions - except rmap.c still using lock_page_memcg(), > > since I hesitate to steal the pleasure of deletion from Johannes. > > Is there a plan to remove lock_page_memcg() altogether which I missed? I > am planning to make lock_page_memcg() a nop for cgroup-v2 (as it shows > up in the perf profile on exit path) but if we are removing it then I > should just wait. We can remove it for rmap at least, but we might be able to do more. Besides rmap, we're left with the dirty and writeback page transitions that wrt cgroups need to be atomic with NR_FILE_DIRTY and NR_WRITEBACK. Looking through the various callsites, I think we can delete it from setting and clearing dirty state, as we always hold the page lock (or the pte lock in some instances of folio_mark_dirty). Both of these are taken from the cgroup side, so we're good there. I think we can also remove it when setting writeback, because those sites have the page locked as well. That leaves clearing writeback. This can't hold the page lock due to the atomic context, so currently we need to take lock_page_memcg() as the lock of last resort. I wonder if we can have cgroup take the xalock instead: writeback ending on file pages always acquires the xarray lock. Swap writeback currently doesn't, but we could make it so (swap_address_space). The only thing that gives me pause is the !mapping check in __folio_end_writeback. File and swapcache pages usually have mappings, and truncation waits for writeback to finish before axing page->mapping. So AFAICS this can only happen if we call end_writeback on something that isn't under writeback - in which case the test_clear will fail and we don't update the stats anyway. But I want to be sure. Does anybody know from the top of their heads if a page under writeback could be without a mapping in some weird cornercase? If we could ensure that the NR_WRITEBACK decs are always protected by the xalock, we could grab it from mem_cgroup_move_account(), and then kill lock_page_memcg() altogether.