> On Thu, Feb 01, 2024 at 02:57:22PM +0100, Michal Koutný wrote: > > Hello. > > > > On Wed, Jan 31, 2024 at 04:24:41PM +0000, "T.J. Mercier" <tjmercier@xxxxxxxxxx> wrote: > > > reclaimed = try_to_free_mem_cgroup_pages(memcg, > > > - min(nr_to_reclaim - nr_reclaimed, SWAP_CLUSTER_MAX), > > > + max((nr_to_reclaim - nr_reclaimed) / 4, > > > + (nr_to_reclaim - nr_reclaimed) % 4), > > > > The 1/4 factor looks like magic. > > It's just cutting the work into quarters to balance throughput with > goal accuracy. It's no more or less magic than DEF_PRIORITY being 12, > or SWAP_CLUSTER_MAX being 32. > > > Commit 0388536ac291 says: > > | In theory, the amount of reclaimed would be in [request, 2 * request). > > Looking at the code, I'm not quite sure if this can be read this > literally. Efly might be able to elaborate, but we do a full loop of > all nodes and cgroups in the tree before checking nr_to_reclaimed, and > rely on priority level for granularity. So request size and complexity > of the cgroup tree play a role. I don't know where the exact factor > two would come from. I'm sorry that this conclusion may be arbitrary. It might just only suit for my case. In my case, I traced it loop twice every time before checking nr_reclaimed, and it reclaimed less than my request size(1G) every time. So I think the upper bound is 2 * request. But now it seems that this is related to cgroup tree I constucted and my system status and my request size(a relatively large chunk). So there are many influencing factors, a specific upper bound is not accurate. > IMO it's more accurate to phrase it like this: > > Reclaim tries to balance nr_to_reclaim fidelity with fairness across > nodes and cgroups over which the pages are spread. As such, the bigger > the request, the bigger the absolute overreclaim error. Historic > in-kernel users of reclaim have used fixed, small request batches to > approach an appropriate reclaim rate over time. When we reclaim a user > request of arbitrary size, use decaying batches to manage error while > maintaining reasonable throughput. > > > Doesn't this suggest 1/2 as a better option? (I didn't pursue the > > theory.) > > That was TJ's first suggestion as well, but as per above I suggested > quartering as a safer option. > > > Also IMO importantly, when nr_to_reclaim - nr_reclaimed is less than 8, > > the formula gives arbitrary (unrelated to delta's magnitude) values. > > try_to_free_mem_cgroup_pages() rounds up to SWAP_CLUSTER_MAX. So the > error margin is much higher at the smaller end of requests anyway. > But practically speaking, users care much less if you reclaim 32 pages > when 16 were requested than if you reclaim 2G when 1G was requested. Yes, I agreed completely that the bigger the request the bigger the absolute overreclaim error. The focus now is the tradeoff between accurate reclaim and efficient reclaim. I think TJ's test is suggestive.