Re: [PATCH 02/18] mm: memcontrol: fix theoretical race in charge moving

Shakeel Butt <shakeelb@xxxxxxxxxx> · Wed, 22 Apr 2020 09:51:20 -0700

On Mon, Apr 20, 2020 at 3:11 PM Johannes Weiner <hannes@xxxxxxxxxxx> wrote:
>
> The move_lock is a per-memcg lock, but the VM accounting code that
> needs to acquire it comes from the page and follows page->mem_cgroup
> under RCU protection. That means that the page becomes unlocked not
> when we drop the move_lock, but when we update page->mem_cgroup. And
> that assignment doesn't imply any memory ordering. If that pointer
> write gets reordered against the reads of the page state -
> page_mapped, PageDirty etc. the state may change while we rely on it
> being stable and we can end up corrupting the counters.
>
> Place an SMP memory barrier to make sure we're done with all page
> state by the time the new page->mem_cgroup becomes visible.
>
> Also replace the open-coded move_lock with a lock_page_memcg() to make
> it more obvious what we're serializing against.
>
> Signed-off-by: Johannes Weiner <hannes@xxxxxxxxxxx>
> ---
>  mm/memcontrol.c | 26 ++++++++++++++------------
>  1 file changed, 14 insertions(+), 12 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 5beea03dd58a..41f5ed79272e 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -5372,7 +5372,6 @@ static int mem_cgroup_move_account(struct page *page,
>  {
>         struct lruvec *from_vec, *to_vec;
>         struct pglist_data *pgdat;
> -       unsigned long flags;
>         unsigned int nr_pages = compound ? hpage_nr_pages(page) : 1;
>         int ret;
>         bool anon;
> @@ -5399,18 +5398,13 @@ static int mem_cgroup_move_account(struct page *page,
>         from_vec = mem_cgroup_lruvec(from, pgdat);
>         to_vec = mem_cgroup_lruvec(to, pgdat);
>
> -       spin_lock_irqsave(&from->move_lock, flags);
> +       lock_page_memcg(page);
>
>         if (!anon && page_mapped(page)) {
>                 __mod_lruvec_state(from_vec, NR_FILE_MAPPED, -nr_pages);
>                 __mod_lruvec_state(to_vec, NR_FILE_MAPPED, nr_pages);
>         }
>
> -       /*
> -        * move_lock grabbed above and caller set from->moving_account, so
> -        * mod_memcg_page_state will serialize updates to PageDirty.
> -        * So mapping should be stable for dirty pages.
> -        */
>         if (!anon && PageDirty(page)) {
>                 struct address_space *mapping = page_mapping(page);
>
> @@ -5426,15 +5420,23 @@ static int mem_cgroup_move_account(struct page *page,
>         }
>
>         /*
> +        * All state has been migrated, let's switch to the new memcg.
> +        *
>          * It is safe to change page->mem_cgroup here because the page
> -        * is referenced, charged, and isolated - we can't race with
> -        * uncharging, charging, migration, or LRU putback.
> +        * is referenced, charged, isolated, and locked: we can't race
> +        * with (un)charging, migration, LRU putback, or anything else
> +        * that would rely on a stable page->mem_cgroup.
> +        *
> +        * Note that lock_page_memcg is a memcg lock, not a page lock,
> +        * to save space. As soon as we switch page->mem_cgroup to a
> +        * new memcg that isn't locked, the above state can change
> +        * concurrently again. Make sure we're truly done with it.
>          */
> +       smp_mb();

You said theoretical race in the subject but the above comment
convinced me that smp_mb() is required. So, why is the race still
theoretical?

>
> -       /* caller should have done css_get */
> -       page->mem_cgroup = to;
> +       page->mem_cgroup = to;  /* caller should have done css_get */
>
> -       spin_unlock_irqrestore(&from->move_lock, flags);
> +       __unlock_page_memcg(from);
>
>         ret = 0;
>
> --
> 2.26.0
>