On 2024/7/24 00:44, Takero Funaki wrote:
2024年7月23日(火) 6:51 Nhat Pham <nphamcs@xxxxxxxxx>:
On Fri, Jul 19, 2024 at 9:41 PM Takero Funaki <flintglass@xxxxxxxxx> wrote:
This patch fixes zswap global shrinker that did not shrink zpool as
expected.
The issue it addresses is that `shrink_worker()` did not distinguish
between unexpected errors and expected error codes that should be
skipped, such as when there is no stored page in a memcg. This led to
the shrinking process being aborted on the expected error codes.
The code itself seems reasonable to me, but may I ask you to document
(as a comment) all the expected v.s unexpected cases? i.e when do we
increment (or not increment) the failure counter?
In addition to changes in the commit log suggested by Yosry,
adding some comments specifying what memcg is (not) candidates for
writeback, and what should be a failure.
- /* global reclaim will select cgroup in a round-robin fashion.
+ /*
+ * Global reclaim will select cgroup in a round-robin fashion from all
+ * online memcgs, but memcgs that have no pages in zswap and
+ * writeback-disabled memcgs (memory.zswap.writeback=0) are not
+ * candidates for shrinking.
+ *
+ * Shrinking will be aborted if we encounter the following
+ * MAX_RECLAIM_RETRIES times:
+ * - No writeback-candidate memcgs found in a memcg tree walk.
+ * - Shrinking a writeback-candidate memcg failed.
*
* We save iteration cursor memcg into zswap_next_shrink,
* which can be modified by the offline memcg cleaner
and, the reasons to (not) increment the progress:
@@ -1387,10 +1407,20 @@ static void shrink_worker(struct work_struct *w)
/* drop the extra reference */
mem_cgroup_put(memcg);
- if (ret == -EINVAL)
- break;
+ /*
+ * There are no writeback-candidate pages in the memcg.
+ * This is not an issue as long as we can find another memcg
+ * with pages in zswap. Skip this without incrementing progress
+ * and failures.
+ */
+ if (ret == -ENOENT)
+ continue;
+
if (ret && ++failures == MAX_RECLAIM_RETRIES)
break;
+
+ /* completed writeback or incremented failures */
+ ++progress;
Maybe the name "progress" is a little confusing here? "progress" sounds
to me that we have some writeback completed.
But actually it just means we have encountered some candidates, right?
Thanks.
resched:
My understanding is, we only increment the failure counter if we fail
to reclaim from a selected memcg that is non-empty and
writeback-enabled, or if we go a full tree walk without making any
progress. Is this correct?
Yes, that's the expected behavior.
Please let me know if there is more appropriate wording.
Thanks.