Re: [PATCH] mm: zswap: fix double invalidate with exclusive loads

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jun 21, 2023 at 3:20 AM Domenico Cerasuolo
<cerasuolodomenico@xxxxxxxxx> wrote:
>
> On Wed, Jun 21, 2023 at 11:30 AM Yosry Ahmed <yosryahmed@xxxxxxxxxx> wrote:
> >
> > If exclusive loads are enabled for zswap, we invalidate the entry before
> > returning from zswap_frontswap_load(), after dropping the local
> > reference. However, the tree lock is dropped during decompression after
> > the local reference is acquired, so the entry could be invalidated
> > before we drop the local ref. If this happens, the entry is freed once
> > we drop the local ref, and zswap_invalidate_entry() tries to invalidate
> > an already freed entry.
> >
> > Fix this by:
> > (a) Making sure zswap_invalidate_entry() is always called with a local
> >     ref held, to avoid being called on a freed entry.
> > (b) Making sure zswap_invalidate_entry() only drops the ref if the entry
> >     was actually on the rbtree. Otherwise, another invalidation could
> >     have already happened, and the initial ref is already dropped.
> >
> > With these changes, there is no need to check that there is no need to
> > make sure the entry still exists in the tree in zswap_reclaim_entry()
> > before invalidating it, as zswap_reclaim_entry() will make this check
> > internally.
> >
> > Fixes: b9c91c43412f ("mm: zswap: support exclusive loads")
> > Reported-by: Hyeonggon Yoo <42.hyeyoo@xxxxxxxxx>
> > Signed-off-by: Yosry Ahmed <yosryahmed@xxxxxxxxxx>
> > ---
> >  mm/zswap.c | 21 ++++++++++++---------
> >  1 file changed, 12 insertions(+), 9 deletions(-)
> >
> > diff --git a/mm/zswap.c b/mm/zswap.c
> > index 87b204233115..62195f72bf56 100644
> > --- a/mm/zswap.c
> > +++ b/mm/zswap.c
> > @@ -355,12 +355,14 @@ static int zswap_rb_insert(struct rb_root *root, struct zswap_entry *entry,
> >         return 0;
> >  }
> >
> > -static void zswap_rb_erase(struct rb_root *root, struct zswap_entry *entry)
> > +static bool zswap_rb_erase(struct rb_root *root, struct zswap_entry *entry)
> >  {
> >         if (!RB_EMPTY_NODE(&entry->rbnode)) {
> >                 rb_erase(&entry->rbnode, root);
> >                 RB_CLEAR_NODE(&entry->rbnode);
> > +               return true;
> >         }
> > +       return false;
> >  }
> >
> >  /*
> > @@ -599,14 +601,16 @@ static struct zswap_pool *zswap_pool_find_get(char *type, char *compressor)
> >         return NULL;
> >  }
> >
> > +/*
> > + * If the entry is still valid in the tree, drop the initial ref and remove it
> > + * from the tree. This function must be called with an additional ref held,
> > + * otherwise it may race with another invalidation freeing the entry.
> > + */
>
> On re-reading this comment there's one thing I'm not sure I get, do we
> really need to hold an additional local ref to call this? As far as I
> understood, once we check that the entry was in the tree before putting
> the initial ref, there's no need for an additional local one.

I believe it is, but please correct me if I am wrong. Consider the
following scenario:

// Initially refcount is at 1

CPU#1:                                  CPU#2:
spin_lock(tree_lock)
zswap_entry_get() // 2 refs
spin_unlock(tree_lock)
                                            spin_lock(tree_lock)
                                            zswap_invalidate_entry() // 1 ref
                                            spin_unlock(tree_lock)
zswap_entry_put() // 0 refs
zswap_invalidate_entry() // problem

That last zswap_invalidate_entry() call in CPU#1 is problematic. The
entry would have already been freed. If we check that the entry is on
the tree by checking RB_EMPTY_NODE(&entry->rbnode), then we are
reading already freed and potentially re-used memory.

We would need to search the tree to make sure the same entry still
exists in the tree (aka what zswap_reclaim_entry() currently does).
This is not ideal in the fault path to have to do the lookups twice.

Also, in zswap_reclaim_entry(), would it be possible if we call
zswap_invalidate_entry() after we drop the local ref that the swap
entry has been reused for a different page? I didn't look closely, but
if yes, then the slab allocator may have repurposed the zswap_entry
and we may find the entry in the tree for the same offset, even though
it is referring to a different page now. This sounds practically
unlikely but perhaps theoretically possible.

I think it's more reliable to call zswap_invalidate_entry() on an
entry that we know is valid before dropping the local ref. Especially
that it's easy to do today by just moving a few lines around.




>
> >  static void zswap_invalidate_entry(struct zswap_tree *tree,
> >                                    struct zswap_entry *entry)
> >  {
> > -       /* remove from rbtree */
> > -       zswap_rb_erase(&tree->rbroot, entry);
> > -
> > -       /* drop the initial reference from entry creation */
> > -       zswap_entry_put(tree, entry);
> > +       if (zswap_rb_erase(&tree->rbroot, entry))
> > +               zswap_entry_put(tree, entry);
> >  }
> >
> >  static int zswap_reclaim_entry(struct zswap_pool *pool)
> > @@ -659,8 +663,7 @@ static int zswap_reclaim_entry(struct zswap_pool *pool)
> >          * swapcache. Drop the entry from zswap - unless invalidate already
> >          * took it out while we had the tree->lock released for IO.
> >          */
> > -       if (entry == zswap_rb_search(&tree->rbroot, swpoffset))
> > -               zswap_invalidate_entry(tree, entry);
> > +       zswap_invalidate_entry(tree, entry);
> >
> >  put_unlock:
> >         /* Drop local reference */
> > @@ -1466,7 +1469,6 @@ static int zswap_frontswap_load(unsigned type, pgoff_t offset,
> >                 count_objcg_event(entry->objcg, ZSWPIN);
> >  freeentry:
> >         spin_lock(&tree->lock);
> > -       zswap_entry_put(tree, entry);
> >         if (!ret && zswap_exclusive_loads_enabled) {
> >                 zswap_invalidate_entry(tree, entry);
> >                 *exclusive = true;
> > @@ -1475,6 +1477,7 @@ static int zswap_frontswap_load(unsigned type, pgoff_t offset,
> >                 list_move(&entry->lru, &entry->pool->lru);
> >                 spin_unlock(&entry->pool->lru_lock);
> >         }
> > +       zswap_entry_put(tree, entry);
> >         spin_unlock(&tree->lock);
> >
> >         return ret;
> > --
> > 2.41.0.162.gfafddb0af9-goog
> >





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux