Re: [PATCH 7/8] zsmalloc: replace per zpage lock with pool->migrate_lock

Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx> · Thu, 11 Nov 2021 10:07:27 +0100

On 2021-11-10 10:54:32 [-0800], Minchan Kim wrote:
> The zsmalloc has used a bit for spin_lock in zpage handle to keep
> zpage object alive during several operations. However, it causes
> the problem for PREEMPT_RT as well as introducing too complicated.
> 
> This patch replaces the bit spin_lock with pool->migrate_lock
> rwlock. It could make the code simple as well as zsmalloc work
> under PREEMPT_RT.
> 
> The drawback is the pool->migrate_lock is bigger granuarity than
> per zpage lock so the contention would be higher than old when
> both IO-related operations(i.e., zsmalloc, zsfree, zs_[map|unmap])
> and compaction(page/zpage migration) are going in parallel(*,
> the migrate_lock is rwlock and IO related functions are all read
> side lock so there is no contention). However, the write-side
> is fast enough(dominant overhead is just page copy) so it wouldn't
> affect much. If the lock granurity becomes more problem later,
> we could introduce table locks based on handle as a hash value.
> 
> Signed-off-by: Minchan Kim <minchan@xxxxxxxxxx>
…
> index b8b098be92fa..5d4c4d254679 100644
> --- a/mm/zsmalloc.c
> +++ b/mm/zsmalloc.c
> @@ -1789,6 +1767,11 @@ static void migrate_write_lock(struct zspage *zspage)
>  	write_lock(&zspage->lock);
>  }
>  
> +static void migrate_write_lock_nested(struct zspage *zspage)
> +{
> +	write_lock_nested(&zspage->lock, SINGLE_DEPTH_NESTING);

I don't have this in my tree. 

> +}
> +
>  static void migrate_write_unlock(struct zspage *zspage)
>  {
>  	write_unlock(&zspage->lock);
…
> @@ -2077,8 +2043,13 @@ static unsigned long __zs_compact(struct zs_pool *pool,
>  	struct zspage *dst_zspage = NULL;
>  	unsigned long pages_freed = 0;
>  
> +	/* protect the race between zpage migration and zs_free */
> +	write_lock(&pool->migrate_lock);
> +	/* protect zpage allocation/free */
>  	spin_lock(&class->lock);
>  	while ((src_zspage = isolate_zspage(class, true))) {
> +		/* protect someone accessing the zspage(i.e., zs_map_object) */
> +		migrate_write_lock(src_zspage);
>  
>  		if (!zs_can_compact(class))
>  			break;
> @@ -2087,6 +2058,8 @@ static unsigned long __zs_compact(struct zs_pool *pool,
>  		cc.s_page = get_first_page(src_zspage);
>  
>  		while ((dst_zspage = isolate_zspage(class, false))) {
> +			migrate_write_lock_nested(dst_zspage);
> +
>  			cc.d_page = get_first_page(dst_zspage);
>  			/*
>  			 * If there is no more space in dst_page, resched

Looking at the these two chunks, the page here comes from a list, you
remove that page from that list and this ensures that you can't lock the
very same pages in reverse order as in:

   migrate_write_lock(dst_zspage);
   …
   	migrate_write_lock(src_zspage);

right?

Sebastian