+Ying On Mon, Nov 13, 2023 at 5:06 AM Zhongkun He <hezhongkun.hzk@xxxxxxxxxxxxx> wrote: > > I recently found two scenarios where zswap entry could not be > released, which will cause shrink_worker and active recycling > to fail. > 1)The swap entry has been freed, but cached in swap_slots_cache, > no swap cache and swapcount=0. > 2)When the option zswap_exclusive_loads_enabled disabled and > zswap_load completed(page in swap_cache and swapcount = 0). For case (1), I think a cleaner solution would be to move the zswap_invalidate() call from swap_range_free() (which is called after the cached slots are freed) to __swap_entry_free_locked() if the usage goes to 0. I actually think conceptually this makes not just for zswap_invalidate(), but also for the arch call, memcg uncharging, etc. Slots caching is a swapfile optimization that should be internal to swapfile code. Once a swap entry is freed (i.e. swap count is 0 AND not in the swap cache), all the hooks should be called (memcg, zswap, arch, ..) as the swap entry is effectively freed. The fact that swapfile code internally batches and caches slots should be transparent to other parts of MM. I am not sure if the calls can just be moved or if there are underlying assumptions in the implementation that would be broken, but it feels like the right thing to do. For case (2), I don't think zswap can just decide to free the entry. In that case, the page is in the swap cache pointing to a swp_entry which has a corresponding zswap entry, and the page is clean because it is already in swap/zswap, so we don't need to write it out again unless it is redirtied. If zswap just drops the entry, and reclaim tries to reclaim the page in the swap cache, it will drop the page assuming that there is a copy in swap/zswap (because it is clean). Now we lost all copies of the page. Am I missing something? > > The above two cases need to be determined by swapcount=0, > fix it. > > Signed-off-by: Zhongkun He <hezhongkun.hzk@xxxxxxxxxxxxx> > --- > mm/zswap.c | 35 +++++++++++++++++++++++++---------- > 1 file changed, 25 insertions(+), 10 deletions(-) > > diff --git a/mm/zswap.c b/mm/zswap.c > index 74411dfdad92..db95491bcdd5 100644 > --- a/mm/zswap.c > +++ b/mm/zswap.c > @@ -1063,11 +1063,12 @@ static int zswap_writeback_entry(struct zswap_entry *entry, > struct mempolicy *mpol; > struct scatterlist input, output; > struct crypto_acomp_ctx *acomp_ctx; > + struct swap_info_struct *si; > struct zpool *pool = zswap_find_zpool(entry); > bool page_was_allocated; > u8 *src, *tmp = NULL; > unsigned int dlen; > - int ret; > + int ret = 0; > struct writeback_control wbc = { > .sync_mode = WB_SYNC_NONE, > }; > @@ -1082,16 +1083,30 @@ static int zswap_writeback_entry(struct zswap_entry *entry, > mpol = get_task_policy(current); > page = __read_swap_cache_async(swpentry, GFP_KERNEL, mpol, > NO_INTERLEAVE_INDEX, &page_was_allocated); > - if (!page) { > + if (!page) > ret = -ENOMEM; > - goto fail; > - } > - > - /* Found an existing page, we raced with load/swapin */ > - if (!page_was_allocated) { > + else if (!page_was_allocated) { > + /* Found an existing page, we raced with load/swapin */ > put_page(page); > ret = -EEXIST; > - goto fail; > + } > + > + if (ret) { > + si = get_swap_device(swpentry); > + if (!si) > + goto out; > + > + /* Two cases to directly release zswap_entry. > + * 1) -ENOMEM,if the swpentry has been freed, but cached in > + * swap_slots_cache(no page and swapcount = 0). > + * 2) -EEXIST, option zswap_exclusive_loads_enabled disabled and > + * zswap_load completed(page in swap_cache and swapcount = 0). > + */ > + if (!swap_swapcount(si, swpentry)) > + ret = 0; > + > + put_swap_device(si); > + goto out; > } > > /* > @@ -1106,7 +1121,7 @@ static int zswap_writeback_entry(struct zswap_entry *entry, > spin_unlock(&tree->lock); > delete_from_swap_cache(page_folio(page)); > ret = -ENOMEM; > - goto fail; > + goto out; > } > spin_unlock(&tree->lock); > > @@ -1151,7 +1166,7 @@ static int zswap_writeback_entry(struct zswap_entry *entry, > > return ret; > > -fail: > +out: > if (!zpool_can_sleep_mapped(pool)) > kfree(tmp); > > -- > 2.25.1 >