Re: [External] Re: [RFC PATCH] zswap: add writeback_time_threshold interface to shrink zswap pool

贺中坤 <hezhongkun.hzk@xxxxxxxxxxxxx> · Fri, 13 Oct 2023 21:38:07 +0800

>
> As Johannes pointed out, with a zswap shrinker, we can just push on
> the memory.reclaim knob, and it'll automatically get pushed down the
> pipeline:
>
> memory -> swap -> zswap
>
> That seems to be a bit more natural and user-friendly to me than
> making the users manually decide to push zswap out to swap.
>
> My ideal vision of how all of this should go is that users provide an
> abstract declaration of requirement, and the specific decision of what
> to be done is left to the kernel to perform, as transparently to the user
> as possible. This philosophy extends to multi-tier memory management
> in general, not just the above 3-tier model.
>

That sounds great，i will backport it and  have a try.

>
> I guess my main concern here is - how do you determine the value
> 600 seconds in the first place?
>

I will test based on different applications and corresponding memory
access models. Usually we run similar programs on the same machine.
First, we can use memory.reclaim to swap out pages to zswap, and with
this patch , I would find the distribution of times the page resides in
zswap, and then choose the appropriate time.

> And yes, the frequency should be greater than the oldness cutoff,
> but how much greater?
>
This depends on the user's memory needs. If you want to reclaim
memory faster, you can set it to 1.5 times the threshold. On the contrary,
you can set it to 1 hour, two hours, etc.

> We can run experiments to decide what cutoff will hurt performance
> the least (or improve the performance the most), but that value will
> be specific to our workload and memory access patterns. Other
> users might need a different value entirely, and they might not have
> the resources to find out.
>
> If it's just a binary decision (on or off), then at least it could be
> one A/B experiment (per workload/service). But the range here
> could vary wildly.
>
> Is there at least a default value that works decently well across
> workload/service, in your experience?
>

Yes I agree, it's difficult to set a perfect value, but it's actually beneficial
to just have a normal value, such as 600 seconds by default. This
means that the zswap value stores values that have not been accessed
within 600 seconds and then unloads them to swap.

> I believe Johannes has explained the case where this could happen.
> But yeah, this should be fixable with by updating the stored time
> field on access (maybe rename it to something a bit more fitting as
> well - last_accessed_time?)

Thanks, I agree.

>
> Regardless, it is incredibly validating to see that other parties share the
> same problems as us :) It's not a super invasive change as well.
> I just don't think it solves the issue that well for every zswap user.

I've noticed this problem before and thought about some solutions,but only
saw your patch recently. I can also try it and discuss it together.At the
same time, I will think about how to improve this patch.