On Tue, Mar 19, 2024 at 11:13 PM Yosry Ahmed <yosryahmed@xxxxxxxxxx> wrote: > > On Tue, Mar 19, 2024 at 10:52:26PM -0700, Chris Li wrote: > > Very deep RB tree requires rebalance at times. That > > contributes to the zswap fault latencies. Xarray does not > > need to perform tree rebalance. Replacing RB tree to xarray > > can have some small performance gain. > > > > One small difference is that xarray insert might fail with > > ENOMEM, while RB tree insert does not allocate additional > > memory. > > > > The zswap_entry size will reduce a bit due to removing the > > RB node, which has two pointers and a color field. Xarray > > store the pointer in the xarray tree rather than the > > zswap_entry. Every entry has one pointer from the xarray > > tree. Overall, switching to xarray should save some memory, > > if the swap entries are densely packed. > > > > Notice the zswap_rb_search and zswap_rb_insert always > > followed by zswap_rb_erase. Use xa_erase and xa_store > > directly. That saves one tree lookup as well. > > > > Remove zswap_invalidate_entry due to no need to call > > zswap_rb_erase any more. Use zswap_free_entry instead. > > > > The "struct zswap_tree" has been replaced by "struct xarray". > > The tree spin lock has transferred to the xarray lock. > > > > Run the kernel build testing 10 times for each version, averages: > > (memory.max=2GB, zswap shrinker and writeback enabled, > > one 50GB swapfile, 24 HT core, 32 jobs) > > > > mm-unstable-a824831a082f xarray v7 > > user 3547.264 3541.509 > > sys 531.176 526.111 > > real 200.752 201.334 > > > > --- > > I believe there shouldn't be a separator before Rb and Sb below. Ack. > > > Reviewed-by: Nhat Pham <nphamcs@xxxxxxxxx> > > > > Signed-off-by: Chris Li <chrisl@xxxxxxxxxx> > > I have some comments below, with them addressed: > > Acked-by: Yosry Ahmed <yosryahmed@xxxxxxxxxx> > > [..] > > @@ -1556,28 +1474,43 @@ bool zswap_store(struct folio *folio) > > insert_entry: > > entry->swpentry = swp; > > entry->objcg = objcg; > > + > > + old = xa_store(tree, offset, entry, GFP_KERNEL); > > + if (xa_is_err(old)) { > > + int err = xa_err(old); > > There should be a blank line after the declaration. > > > + WARN_ONCE(err != -ENOMEM, "unexpected xarray error: %d\n", err); > > + zswap_reject_alloc_fail++; > > + goto store_failed; > > + } > > + > > + /* > > + * We may have had an existing entry that became stale when > > + * the folio was redirtied and now the new version is being > > + * swapped out. Get rid of the old. > > + */ > > This comment is mis-indented. Ah, there is some space instead of a tab because the comment was copied from an email. Will fix it. > > checkpatch would have caught these btw. > > > + if (old) > > + zswap_entry_free(old); > > + > > if (objcg) { > > obj_cgroup_charge_zswap(objcg, entry->length); > > - /* Account before objcg ref is moved to tree */ > > count_objcg_event(objcg, ZSWPOUT); > > } > > > > - /* map */ > > - spin_lock(&tree->lock); > > /* > > - * The folio may have been dirtied again, invalidate the > > - * possibly stale entry before inserting the new entry. > > + * We finish initializing the entry while it's already in xarray. > > + * This is safe because: > > + * > > + * 1. Concurrent stores and invalidations are excluded by folio lock. > > + * > > + * 2. Writeback is excluded by the entry not being on the LRU yet. > > + * The publishing order matters to prevent writeback from seeing > > + * an incoherent entry. > > As I mentioned before, writeback is also protected by the folio lock. > Concurrent writeback will find the folio in the swapcache and abort. The > fact that the entry is not on the LRU yet is just additional protection, > so I don't think the publishing order actually matters here. Right? Right. This comment is explaining why this publishing order does not matter. I think we are talking about the same thing here? Chris