On Tue, Jul 10, 2018 at 03:49:58PM +0800, Huang, Ying wrote: > Daniel Jordan <daniel.m.jordan@xxxxxxxxxx> writes: > > > On Fri, Jun 22, 2018 at 11:51:44AM +0800, Huang, Ying wrote: > >> Because there is no way to prevent a huge swap cluster from being > >> split except when it has SWAP_HAS_CACHE flag set. > > > > What about making get_mctgt_type_thp take the cluster lock? That function > > would be the first lock_cluster user outside of swapfile.c, but it would > > serialize with split_swap_cluster. > > > >> It is possible for > >> the huge swap cluster to be split and the charge for the swap slots > >> inside to be changed, after we check the PMD swap mapping and the huge > >> swap cluster before we commit the charge moving. But the race window > >> is so small, that we will just ignore the race. > > > > Moving the charges is a slow path, so can't we just be correct here and not > > leak? > > Check the code and thought about this again, found the race may not > exist. Because the PMD is locked when get_mctgt_type_thp() is called > until charge is completed for the PMD. So the charge of the huge swap > cluster cannot be changed at the same time even if the huge swap cluster > is split by other processes. That's true, the PMD lock does prevent the swap charge from going stale between the time mem_cgroup_move_charge_pte_range identifies a huge swap entry in get_mctgt_type_thp and the time it moves the charge in mem_cgroup_move_account, at least for some events like swapping in pages. > Right? I'm not sure the PMD lock covers everything, but after looking a while longer at this, the charge moving seems to be a best-effort feature in other respects, so the accounting doesn't need to be completely accurate.