On Wed, Jun 21, 2023 at 7:26 PM Yosry Ahmed <yosryahmed@xxxxxxxxxx> wrote: > > On Wed, Jun 21, 2023 at 3:20 AM Domenico Cerasuolo > <cerasuolodomenico@xxxxxxxxx> wrote: > > > > On Wed, Jun 21, 2023 at 11:30 AM Yosry Ahmed <yosryahmed@xxxxxxxxxx> wrote: > > > > > > If exclusive loads are enabled for zswap, we invalidate the entry before > > > returning from zswap_frontswap_load(), after dropping the local > > > reference. However, the tree lock is dropped during decompression after > > > the local reference is acquired, so the entry could be invalidated > > > before we drop the local ref. If this happens, the entry is freed once > > > we drop the local ref, and zswap_invalidate_entry() tries to invalidate > > > an already freed entry. > > > > > > Fix this by: > > > (a) Making sure zswap_invalidate_entry() is always called with a local > > > ref held, to avoid being called on a freed entry. > > > (b) Making sure zswap_invalidate_entry() only drops the ref if the entry > > > was actually on the rbtree. Otherwise, another invalidation could > > > have already happened, and the initial ref is already dropped. > > > > > > With these changes, there is no need to check that there is no need to > > > make sure the entry still exists in the tree in zswap_reclaim_entry() > > > before invalidating it, as zswap_reclaim_entry() will make this check > > > internally. > > > > > > Fixes: b9c91c43412f ("mm: zswap: support exclusive loads") > > > Reported-by: Hyeonggon Yoo <42.hyeyoo@xxxxxxxxx> > > > Signed-off-by: Yosry Ahmed <yosryahmed@xxxxxxxxxx> > > > --- > > > mm/zswap.c | 21 ++++++++++++--------- > > > 1 file changed, 12 insertions(+), 9 deletions(-) > > > > > > diff --git a/mm/zswap.c b/mm/zswap.c > > > index 87b204233115..62195f72bf56 100644 > > > --- a/mm/zswap.c > > > +++ b/mm/zswap.c > > > @@ -355,12 +355,14 @@ static int zswap_rb_insert(struct rb_root *root, struct zswap_entry *entry, > > > return 0; > > > } > > > > > > -static void zswap_rb_erase(struct rb_root *root, struct zswap_entry *entry) > > > +static bool zswap_rb_erase(struct rb_root *root, struct zswap_entry *entry) > > > { > > > if (!RB_EMPTY_NODE(&entry->rbnode)) { > > > rb_erase(&entry->rbnode, root); > > > RB_CLEAR_NODE(&entry->rbnode); > > > + return true; > > > } > > > + return false; > > > } > > > > > > /* > > > @@ -599,14 +601,16 @@ static struct zswap_pool *zswap_pool_find_get(char *type, char *compressor) > > > return NULL; > > > } > > > > > > +/* > > > + * If the entry is still valid in the tree, drop the initial ref and remove it > > > + * from the tree. This function must be called with an additional ref held, > > > + * otherwise it may race with another invalidation freeing the entry. > > > + */ > > > > On re-reading this comment there's one thing I'm not sure I get, do we > > really need to hold an additional local ref to call this? As far as I > > understood, once we check that the entry was in the tree before putting > > the initial ref, there's no need for an additional local one. > > I believe it is, but please correct me if I am wrong. Consider the > following scenario: > > // Initially refcount is at 1 > > CPU#1: CPU#2: > spin_lock(tree_lock) > zswap_entry_get() // 2 refs > spin_unlock(tree_lock) > spin_lock(tree_lock) > zswap_invalidate_entry() // 1 ref > spin_unlock(tree_lock) > zswap_entry_put() // 0 refs > zswap_invalidate_entry() // problem > > That last zswap_invalidate_entry() call in CPU#1 is problematic. The > entry would have already been freed. If we check that the entry is on > the tree by checking RB_EMPTY_NODE(&entry->rbnode), then we are > reading already freed and potentially re-used memory. > > We would need to search the tree to make sure the same entry still > exists in the tree (aka what zswap_reclaim_entry() currently does). > This is not ideal in the fault path to have to do the lookups twice. Thanks for the clarification, it is indeed needed in that case. I was just wondering if the wording of the comment is exact, in that before calling zswap_invalidate_entry one has to ensure that the entry has not been freed, not specifically by holding an additional reference, if a lookup can serve the same purpose. > > Also, in zswap_reclaim_entry(), would it be possible if we call > zswap_invalidate_entry() after we drop the local ref that the swap > entry has been reused for a different page? I didn't look closely, but > if yes, then the slab allocator may have repurposed the zswap_entry > and we may find the entry in the tree for the same offset, even though > it is referring to a different page now. This sounds practically > unlikely but perhaps theoretically possible. I'm not sure I understood the scenario, in zswap_reclaim_entry we keep a local reference until the end in order to avoid a free. > > I think it's more reliable to call zswap_invalidate_entry() on an > entry that we know is valid before dropping the local ref. Especially > that it's easy to do today by just moving a few lines around. > > > > > > > > > static void zswap_invalidate_entry(struct zswap_tree *tree, > > > struct zswap_entry *entry) > > > { > > > - /* remove from rbtree */ > > > - zswap_rb_erase(&tree->rbroot, entry); > > > - > > > - /* drop the initial reference from entry creation */ > > > - zswap_entry_put(tree, entry); > > > + if (zswap_rb_erase(&tree->rbroot, entry)) > > > + zswap_entry_put(tree, entry); > > > } > > > > > > static int zswap_reclaim_entry(struct zswap_pool *pool) > > > @@ -659,8 +663,7 @@ static int zswap_reclaim_entry(struct zswap_pool *pool) > > > * swapcache. Drop the entry from zswap - unless invalidate already > > > * took it out while we had the tree->lock released for IO. > > > */ > > > - if (entry == zswap_rb_search(&tree->rbroot, swpoffset)) > > > - zswap_invalidate_entry(tree, entry); > > > + zswap_invalidate_entry(tree, entry); > > > > > > put_unlock: > > > /* Drop local reference */ > > > @@ -1466,7 +1469,6 @@ static int zswap_frontswap_load(unsigned type, pgoff_t offset, > > > count_objcg_event(entry->objcg, ZSWPIN); > > > freeentry: > > > spin_lock(&tree->lock); > > > - zswap_entry_put(tree, entry); > > > if (!ret && zswap_exclusive_loads_enabled) { > > > zswap_invalidate_entry(tree, entry); > > > *exclusive = true; > > > @@ -1475,6 +1477,7 @@ static int zswap_frontswap_load(unsigned type, pgoff_t offset, > > > list_move(&entry->lru, &entry->pool->lru); > > > spin_unlock(&entry->pool->lru_lock); > > > } > > > + zswap_entry_put(tree, entry); > > > spin_unlock(&tree->lock); > > > > > > return ret; > > > -- > > > 2.41.0.162.gfafddb0af9-goog > > >