On Wed, Jan 17, 2024 at 10:57 PM Chengming Zhou <zhouchengming@xxxxxxxxxxxxx> wrote: > > Hi Yosry and Chris, > > On 2024/1/18 14:39, Yosry Ahmed wrote: > > On Wed, Jan 17, 2024 at 10:01 PM Yosry Ahmed <yosryahmed@xxxxxxxxxx> wrote: > >> > >> That's a long CC list for sure :) > >> > >> On Wed, Jan 17, 2024 at 7:06 PM Chris Li <chrisl@xxxxxxxxxx> wrote: > >>> > >>> The RB tree shows some contribution to the swap fault > >>> long tail latency due to two factors: > >>> 1) RB tree requires re-balance from time to time. > >>> 2) The zswap RB tree has a tree level spin lock protecting > >>> the tree access. > >>> > >>> The swap cache is using xarray. The break down the swap > >>> cache access does not have the similar long time as zswap > >>> RB tree. > >> > >> I think the comparison to the swap cache may not be valid as the swap > >> cache has many trees per swapfile, while zswap has a single tree. > >> > >>> > >>> Moving the zswap entry to xarray enable read side > >>> take read RCU lock only. > >> > >> Nice. > >> > >>> > >>> The first patch adds the xarray alongside the RB tree. > >>> There is some debug check asserting the xarray agrees with > >>> the RB tree results. > >>> > >>> The second patch removes the zwap RB tree. > >> > >> The breakdown looks like something that would be a development step, > >> but for patch submission I think it makes more sense to have a single > >> patch replacing the rbtree with an xarray. > >> > >>> > >>> I expect to merge the zswap rb tree spin lock with the xarray > >>> lock in the follow up changes. > >> > >> Shouldn't this simply be changing uses of tree->lock to use > >> xa_{lock/unlock}? We also need to make sure we don't try to lock the > >> tree when operating on the xarray if the caller is already holding the > >> lock, but this seems to be straightforward enough to be done as part > >> of this patch or this series at least. > >> > >> Am I missing something? > > > > Also, I assume we will only see performance improvements after the > > tree lock in its current form is removed so that we get loads > > protected only by RCU. Can we get some performance numbers to see how > > the latency improves with the xarray under contention (unless > > Chengming is already planning on testing this for his multi-tree > > patches). > > I just give it a try, the same test of kernel build in tmpfs with zswap > shrinker enabled, all based on the latest mm/mm-stable branch. > > mm-stable zswap-split-tree zswap-xarray > real 1m10.442s 1m4.157s 1m9.962s > user 17m48.232s 17m41.477s 17m45.887s > sys 8m13.517s 5m2.226s 7m59.305s > > Looks like the contention of concurrency is still there, I haven't > look into the code yet, will review it later. I think that's expected with the current version because the tree spin_lock is still there and we are still doing lookups with a spinlock.