On 2024/1/19 02:37, Yosry Ahmed wrote: > On Thu, Jan 18, 2024 at 10:07 AM Johannes Weiner <hannes@xxxxxxxxxxx> wrote: >> >> On Thu, Jan 18, 2024 at 09:30:12AM -0800, Yosry Ahmed wrote: >>> On Thu, Jan 18, 2024 at 7:34 AM Johannes Weiner <hannes@xxxxxxxxxxx> wrote: >>>> >>>> On Wed, Jan 17, 2024 at 10:37:22AM -0800, Yosry Ahmed wrote: >>>>> On Wed, Jan 17, 2024 at 1:23 AM Chengming Zhou >>>>> <zhouchengming@xxxxxxxxxxxxx> wrote: >>>>>> >>>>>> When testing the zswap performance by using kernel build -j32 in a tmpfs >>>>>> directory, I found the scalability of zswap rb-tree is not good, which >>>>>> is protected by the only spinlock. That would cause heavy lock contention >>>>>> if multiple tasks zswap_store/load concurrently. >>>>>> >>>>>> So a simple solution is to split the only one zswap rb-tree into multiple >>>>>> rb-trees, each corresponds to SWAP_ADDRESS_SPACE_PAGES (64M). This idea is >>>>>> from the commit 4b3ef9daa4fc ("mm/swap: split swap cache into 64MB trunks"). >>>>>> >>>>>> Although this method can't solve the spinlock contention completely, it >>>>>> can mitigate much of that contention. Below is the results of kernel build >>>>>> in tmpfs with zswap shrinker enabled: >>>>>> >>>>>> linux-next zswap-lock-optimize >>>>>> real 1m9.181s 1m3.820s >>>>>> user 17m44.036s 17m40.100s >>>>>> sys 7m37.297s 4m54.622s >>>>>> >>>>>> So there are clearly improvements. And it's complementary with the ongoing >>>>>> zswap xarray conversion by Chris. Anyway, I think we can also merge this >>>>>> first, it's complementary IMHO. So I just refresh and resend this for >>>>>> further discussion. >>>>> >>>>> The reason why I think we should wait for the xarray patch(es) is >>>>> there is a chance we may see less improvements from splitting the tree >>>>> if it was an xarray. If we merge this series first, there is no way to >>>>> know. >>>> >>>> I mentioned this before, but I disagree quite strongly with this >>>> general sentiment. >>>> >>>> Chengming's patches are simple, mature, and have convincing >>>> numbers. IMO it's poor form to hold something like that for "let's see >>>> how our other experiment works out". The only exception would be if we >>>> all agree that the earlier change flies in the face of the overall >>>> direction we want to pursue, which I don't think is the case here. >>> >>> My intention was not to delay merging these patches until the xarray >>> patches are merged in. It was only to wait until the xarray patches >>> are *posted*, so that we can redo the testing on top of them and >>> verify that the gains are still there. That should have been around >>> now, but the xarray patches were posted in a form that does not allow >>> this testing (because we still have a lock on the read path), so I am >>> less inclined. >>> >>> My rationale was that if the gains from splitting the tree become >>> minimal after we switch to an xarray, we won't know. It's more >>> difficult to remove optimizations than to add them, because we may >>> cause a regression. I am kind of paranoid about having code sitting >>> around that we don't have full information about how much it's needed. >> >> Yeah I understand that fear. >> >> I expect the splitting to help more than the move to xarray because >> it's the writes that are hot. Luckily in this case it should be fairly >> easy to differential-test after it's been merged by changing that tree >> lookup macro/function locally to always return &trees[type][0], right? > > Yeah that's exactly what I had in mind. Once we have a version of the > xarray patch without the locking on the read side we can test with > that. Chengming, does this sound reasonable to you? It's ok, sounds reasonable to me. I agree with Johannes, we will need both since xarray still have a spinlock in the writes, it's clearly better to split it. As for testing, we can always return &trees[type][0]. Thanks!