Re: [PATCH] zsmalloc: move LRU update from zs_map_object() to zs_malloc()

Sergey Senozhatsky <senozhatsky@xxxxxxxxxxxx> · Wed, 10 May 2023 09:39:22 +0900



On (23/05/05 11:50), Nhat Pham wrote:
[..]
> zswap_frontswap_store()            shrink_worker()
>   zs_malloc()                        zs_zpool_shrink()
>     spin_lock(&pool->lock)             zs_reclaim_page()
>     zspage = find_get_zspage()
>     spin_unlock(&pool->lock)
>                                          spin_lock(&pool->lock)
>                                          zspage = list_first_entry(&pool->lru)
>                                          list_del(&zspage->lru)
>                                            zspage->lru.next = LIST_POISON1
>                                            zspage->lru.prev = LIST_POISON2
>                                          spin_unlock(&pool->lock)
>   zs_map_object()
>     spin_lock(&pool->lock)
>     if (!list_empty(&zspage->lru))
>       list_del(&zspage->lru)
>         CHECK_DATA_CORRUPTION(next == LIST_POISON1) /* BOOM */
> 
> With the current upstream code, this issue rarely happens. zswap only
> triggers writeback when the pool is already full, at which point all
> further store attempts are short-circuited. This creates an implicit
> pseudo-serialization between reclaim and store. I am working on a new
> zswap shrinking mechanism, which makes interleaving reclaim and store
> more likely, exposing this bug.
> 
> zbud and z3fold do not have this problem, because they perform the LRU
> list update in the alloc function, while still holding the pool's lock.
> This patch fixes the aforementioned bug by moving the LRU update back to
> zs_malloc(), analogous to zbud and z3fold.
> 
> Suggested-by: Johannes Weiner <hannes@xxxxxxxxxxx>
> Acked-by: Johannes Weiner <hannes@xxxxxxxxxxx>
> Signed-off-by: Nhat Pham <nphamcs@xxxxxxxxx>

Reviewed-by: Sergey Senozhatsky <senozhatsky@xxxxxxxxxxxx>