Hi Yosry, On Wed, Jan 17, 2024 at 10:35 PM Yosry Ahmed <yosryahmed@xxxxxxxxxx> wrote: > > > @@ -493,45 +471,47 @@ static struct zswap_entry *zswap_search(struct zswap_tree *tree, pgoff_t offset) > > static int zswap_insert(struct zswap_tree *tree, struct zswap_entry *entry, > > struct zswap_entry **dupentry) > > { > > - struct rb_root *root = &tree->rbroot; > > - struct rb_node **link = &root->rb_node, *parent = NULL; > > - struct zswap_entry *myentry, *old; > > - pgoff_t myentry_offset, entry_offset = swp_offset(entry->swpentry); > > - > > - > > - while (*link) { > > - parent = *link; > > - myentry = rb_entry(parent, struct zswap_entry, rbnode); > > - myentry_offset = swp_offset(myentry->swpentry); > > - if (myentry_offset > entry_offset) > > - link = &(*link)->rb_left; > > - else if (myentry_offset < entry_offset) > > - link = &(*link)->rb_right; > > - else { > > - old = xa_load(&tree->xarray, entry_offset); > > - BUG_ON(old != myentry); > > - *dupentry = myentry; > > + struct zswap_entry *e; > > + pgoff_t offset = swp_offset(entry->swpentry); > > + XA_STATE(xas, &tree->xarray, offset); > > + > > + do { > > + xas_lock_irq(&xas); > > + do { > > + e = xas_load(&xas); > > + if (xa_is_zero(e)) > > + e = NULL; > > + } while (xas_retry(&xas, e)); > > + if (xas_valid(&xas) && e) { > > + xas_unlock_irq(&xas); > > + *dupentry = e; > > return -EEXIST; > > } > > - } > > - rb_link_node(&entry->rbnode, parent, link); > > - rb_insert_color(&entry->rbnode, root); > > - old = xa_store(&tree->xarray, entry_offset, entry, GFP_KERNEL); > > - return 0; > > + xas_store(&xas, entry); > > + xas_unlock_irq(&xas); > > + } while (xas_nomem(&xas, GFP_KERNEL)); > > + return xas_error(&xas); > > I think using the xas_* APIs can be avoided here. The only reason we > need it is that we want to check if there's an existing entry first, > and return -EEXIST. However, in that case, the caller will replace it > anyway (and do some operations on the dupentry): We might be able to for the insert case if we don't mind changing the code behavior a bit. My original intent is to keep close to the original zswap code and not stir the pot too much for the xarray replacement. We can always make more adjustment once the RB tree is gone. > while (zswap_rb_insert(&tree->rbroot, entry, &dupentry) == -EEXIST) { > WARN_ON(1); > zswap_duplicate_entry++; > zswap_invalidate_entry(tree, dupentry); > } > > So I think we can do something like this in zswap_insert() instead: > > dupentry = xa_store(..); > if (WARN_ON(dupentry)) { > zswap_duplicate_entry++; > zswap_invalidate_entry(tree, dupentry); > } > > WDYT? > > > } > > > > static bool zswap_erase(struct zswap_tree *tree, struct zswap_entry *entry) > > { > > + struct zswap_entry *e; > > pgoff_t offset = swp_offset(entry->swpentry); > > - if (!RB_EMPTY_NODE(&entry->rbnode)) { > > - struct zswap_entry *old; > > - old = xa_erase(&tree->xarray, offset); > > - BUG_ON(old != entry); > > - rb_erase(&entry->rbnode, &tree->rbroot); > > - RB_CLEAR_NODE(&entry->rbnode); > > - return true; > > - } > > - return false; > > + XA_STATE(xas, &tree->xarray, offset); > > + > > + do { > > + xas_lock_irq(&xas); > > + do { > > + e = xas_load(&xas); > > + } while (xas_retry(&xas, e)); > > + if (xas_valid(&xas) && e != entry) { > > + xas_unlock_irq(&xas); > > + return false; > > + } > > + xas_store(&xas, NULL); > > + xas_unlock_irq(&xas); > > + } while (xas_nomem(&xas, GFP_KERNEL)); > > + return !xas_error(&xas); > > } > > Same here, I think we just want: > > return !!xa_erase(..); For the erase case it is tricky. The current zswap code does not erase an entry if the tree entry at the same offset has been changed. It should be fine if the new entry is NULL. Basically some race to remove the entry already. However, if the entry is not NULL, then force resetting it to NULL will change behavior compared to the current. >From reading the zswap code, I am not sure this is in theory impossible to happen: Some one races to remove the zswap entry then reclaim converts that page back to the zswap, with a new entry. By the time the zswap_erase() tries to erase the current entry, the entry has changed to a new entry. It seems the right thing to do in this unlikely case is that we should skip the force erase and drop the current entry, assuming someone else has already invalidated it. Don't touch the entry in the tree, we assume it is a newer generation. If we can reason the above is impossible to happen, then the force erase and reset the entry to NULL should be fine(Famous last word). Noted that this is a behavior change, I would like to seperate it out from the drop in replacement patch(keep the original behavior) Chris > > > > > static struct zpool *zswap_find_zpool(struct zswap_entry *entry) > > @@ -583,7 +563,6 @@ static void zswap_entry_put(struct zswap_tree *tree, > > > > WARN_ON_ONCE(refcount < 0); > > if (refcount == 0) { > > - WARN_ON_ONCE(!RB_EMPTY_NODE(&entry->rbnode)); > > zswap_free_entry(entry); > > } > > nit: the braces are no longer needed here >