Re: [PATCH v3] memcg: charge before adding to swapcache on swapin

Hugh Dickins <hughd@xxxxxxxxxx> · Fri, 5 Mar 2021 00:06:31 -0800 (PST)

On Wed, 3 Mar 2021, Shakeel Butt wrote:

> Currently the kernel adds the page, allocated for swapin, to the
> swapcache before charging the page. This is fine but now we want a
> per-memcg swapcache stat which is essential for folks who wants to
> transparently migrate from cgroup v1's memsw to cgroup v2's memory and
> swap counters. In addition charging a page before exposing it to other
> parts of the kernel is a step in the right direction.
> 
> To correctly maintain the per-memcg swapcache stat, this patch has
> adopted to charge the page before adding it to swapcache. One
> challenge in this option is the failure case of add_to_swap_cache() on
> which we need to undo the mem_cgroup_charge(). Specifically undoing
> mem_cgroup_uncharge_swap() is not simple.
> 
> To resolve the issue, this patch introduces transaction like interface
> to charge a page for swapin. The function mem_cgroup_charge_swapin_page()
> initiates the charging of the page and mem_cgroup_finish_swapin_page()
> completes the charging process. So, the kernel starts the charging
> process of the page for swapin with mem_cgroup_charge_swapin_page(),
> adds the page to the swapcache and on success completes the charging
> process with mem_cgroup_finish_swapin_page().
> 
> Signed-off-by: Shakeel Butt <shakeelb@xxxxxxxxxx>

Quite apart from helping with the stat you want, what you've ended
up with here is a nice cleanup in several different ways (and I'm
glad Johannes talked you out of __GFP_NOFAIL: much better like this).
I'll say

Acked-by: Hugh Dickins <hughd@xxxxxxxxxx>

but I am quite unhappy with the name mem_cgroup_finish_swapin_page():
it doesn't finish the swapin, it doesn't finish the page, and I'm
not persuaded by your paragraph above that there's any "transaction"
here (if there were, I'd suggest "commit" instead of "finish"'; and
I'd get worried by the css_put before it's called - but no, that's
fine, it's independent).

How about complementing mem_cgroup_charge_swapin_page() with
mem_cgroup_uncharge_swapin_swap()?  I think that describes well
what it does, at least in the do_memsw_account() case, and I hope
we can overlook that it does nothing at all in the other case.

And it really doesn't need a page argument: both places it's called
have just allocated an order-0 page, there's no chance of a THP here;
but you might have some idea of future expansion, or matching
put_swap_page() - I won't object if you prefer to pass in the page.

But more interesting, though off-topic, comments on it below...

> +/*
> + * mem_cgroup_finish_swapin_page - complete the swapin page charge transaction
> + * @page: page charged for swapin
> + * @entry: swap entry for which the page is charged
> + *
> + * This function completes the transaction of charging the page allocated for
> + * swapin.
> + */
> +void mem_cgroup_finish_swapin_page(struct page *page, swp_entry_t entry)
> +{
>  	/*
>  	 * Cgroup1's unified memory+swap counter has been charged with the
>  	 * new swapcache page, finish the transfer by uncharging the swap
> @@ -6760,20 +6796,14 @@ int mem_cgroup_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask)
>  	 * correspond 1:1 to page and swap slot lifetimes: we charge the
>  	 * page to memory here, and uncharge swap when the slot is freed.
>  	 */
> -	if (do_memsw_account() && PageSwapCache(page)) {
> -		swp_entry_t entry = { .val = page_private(page) };
> +	if (!mem_cgroup_disabled() && do_memsw_account()) {

I understand why you put that !mem_cgroup_disabled() check in there,
but I have a series of observations on that.

First I was going to say that it would be better left to
mem_cgroup_uncharge_swap() itself.

Then I was going to say that I think it's already covered here
by the cgroup_memory_noswap check inside do_memsw_account().

Then, going back to mem_cgroup_uncharge_swap(), I realized that 5.8's
2d1c498072de ("mm: memcontrol: make swap tracking an integral part of
memory control") removed the do_swap_account or cgroup_memory_noswap
checks from mem_cgroup_uncharge_swap() and swap_cgroup_swapon() and
swap_cgroup_swapoff() - so since then we have been allocating totally
unnecessary swap_cgroup arrays when mem_cgroup_disabled() (and
mem_cgroup_uncharge_swap() has worked by reading the zalloced array).

I think, or am I confused? If I'm right on that, one of us ought to
send another patch putting back, either cgroup_memory_noswap checks
or mem_cgroup_disabled() checks in those three places - I suspect the
static key mem_cgroup_disabled() is preferable, but I'm getting dozy.

Whatever we do with that - and it's really not any business for this
patch - I think you can drop the mem_cgroup_disabled() check from
mem_cgroup_uncharge_swapin_swap().

>  		/*
>  		 * The swap entry might not get freed for a long time,
>  		 * let's not wait for it.  The page already received a
>  		 * memory+swap charge, drop the swap entry duplicate.
>  		 */
> -		mem_cgroup_uncharge_swap(entry, nr_pages);
> +		mem_cgroup_uncharge_swap(entry, thp_nr_pages(page));
>  	}
> -
> -out_put:
> -	css_put(&memcg->css);
> -out:
> -	return ret;
>  }