RE: [PATCH v6] mm/zswap: move to use crypto_acomp API for hardware acceleration

"Song Bao Hua (Barry Song)" <song.bao.hua@xxxxxxxxxxxxx> · Tue, 29 Sep 2020 05:14:31 +0000

> -----Original Message-----
> From: Song Bao Hua (Barry Song)
> Sent: Tuesday, September 29, 2020 10:02 AM
> To: 'Sebastian Andrzej Siewior' <bigeasy@xxxxxxxxxxxxx>
> Cc: akpm@xxxxxxxxxxxxxxxxxxxx; herbert@xxxxxxxxxxxxxxxxxxx;
> davem@xxxxxxxxxxxxx; linux-crypto@xxxxxxxxxxxxxxx; linux-mm@xxxxxxxxx;
> linux-kernel@xxxxxxxxxxxxxxx; Luis Claudio R . Goncalves
> <lgoncalv@xxxxxxxxxx>; Mahipal Challa <mahipalreddy2006@xxxxxxxxx>;
> Seth Jennings <sjenning@xxxxxxxxxx>; Dan Streetman <ddstreet@xxxxxxxx>;
> Vitaly Wool <vitaly.wool@xxxxxxxxxxxx>; Wangzhou (B)
> <wangzhou1@xxxxxxxxxxxxx>; fanghao (A) <fanghao11@xxxxxxxxxx>; Colin
> Ian King <colin.king@xxxxxxxxxxxxx>
> Subject: RE: [PATCH v6] mm/zswap: move to use crypto_acomp API for
> hardware acceleration
> 
> 
> 
> > -----Original Message-----
> > From: Sebastian Andrzej Siewior [mailto:bigeasy@xxxxxxxxxxxxx]
> > Sent: Tuesday, September 29, 2020 4:25 AM
> > To: Song Bao Hua (Barry Song) <song.bao.hua@xxxxxxxxxxxxx>
> > Cc: akpm@xxxxxxxxxxxxxxxxxxxx; herbert@xxxxxxxxxxxxxxxxxxx;
> > davem@xxxxxxxxxxxxx; linux-crypto@xxxxxxxxxxxxxxx; linux-mm@xxxxxxxxx;
> > linux-kernel@xxxxxxxxxxxxxxx; Luis Claudio R . Goncalves
> > <lgoncalv@xxxxxxxxxx>; Mahipal Challa <mahipalreddy2006@xxxxxxxxx>;
> > Seth Jennings <sjenning@xxxxxxxxxx>; Dan Streetman <ddstreet@xxxxxxxx>;
> > Vitaly Wool <vitaly.wool@xxxxxxxxxxxx>; Wangzhou (B)
> > <wangzhou1@xxxxxxxxxxxxx>; fanghao (A) <fanghao11@xxxxxxxxxx>; Colin
> > Ian King <colin.king@xxxxxxxxxxxxx>
> > Subject: Re: [PATCH v6] mm/zswap: move to use crypto_acomp API for
> > hardware acceleration
> >
> > On 2020-08-19 00:31:00 [+1200], Barry Song wrote:
> > > diff --git a/mm/zswap.c b/mm/zswap.c
> > > index fbb782924ccc..00b5f14a7332 100644
> > > --- a/mm/zswap.c
> > > +++ b/mm/zswap.c
> > > @@ -127,9 +129,17 @@
> > module_param_named(same_filled_pages_enabled,
> > zswap_same_filled_pages_enabled,
> > >  * data structures
> > >  **********************************/
> > >
> > > +struct crypto_acomp_ctx {
> > > +	struct crypto_acomp *acomp;
> > > +	struct acomp_req *req;
> > > +	struct crypto_wait wait;
> > > +	u8 *dstmem;
> > > +	struct mutex *mutex;
> > > +};
> > > +
> > >  struct zswap_pool {
> > >  	struct zpool *zpool;
> > > -	struct crypto_comp * __percpu *tfm;
> > > +	struct crypto_acomp_ctx __percpu *acomp_ctx;
> > >  	struct kref kref;
> > >  	struct list_head list;
> > >  	struct work_struct release_work;
> > > @@ -388,23 +398,43 @@ static struct zswap_entry
> > *zswap_entry_find_get(struct rb_root *root,
> > >  * per-cpu code
> > >  **********************************/
> > >  static DEFINE_PER_CPU(u8 *, zswap_dstmem);
> > > +/*
> > > + * If users dynamically change the zpool type and compressor at runtime,
> > i.e.
> > > + * zswap is running, zswap can have more than one zpool on one cpu, but
> > they
> > > + * are sharing dtsmem. So we need this mutex to be per-cpu.
> > > + */
> > > +static DEFINE_PER_CPU(struct mutex *, zswap_mutex);
> >
> > There is no need to make it sturct mutex*. You could make it struct
> > mutex. The following make it more obvious how the relation here is (even
> > without a comment):
> >
> > |struct zswap_mem_pool {
> > |	void		*dst_mem;
> > |	struct mutex	lock;
> > |};
> > |
> > |static DEFINE_PER_CPU(struct zswap_mem_pool , zswap_mem_pool);
> >
> > Please access only this, don't add use a pointer in a another struct to
> > dest_mem or lock which may confuse people.
> 
> Well. It seems sensible.

After second thought and trying to make this change, I would like to change my mind
and disagree with this idea. Two reasons:
1. while using this_cpu_ptr() without preemption lock, people usually put all things bound
with one cpu to one structure, so that once we get the pointer of the whole structure, we get
all its parts belonging to the same cpu. If we move the dstmem and mutex out of the structure
containing them, we will have to do:
	a. get_cpu_ptr() for the acomp_ctx   //lock preemption
	b. this_cpu_ptr() for the dstmem and mutex
	c. put_cpu_ptr() for the acomp_ctx  //unlock preemption
	d. mutex_lock()
	  sg_init_one()
	  compress/decompress etc.
	  ...
	  mutex_unlock

as the get() and put() have a preemption lock/unlock, this will make certain this_cpu_ptr()
in the step "b" will return the right dstmem and mutex which belong to the same cpu with
step "a".

The steps from "a" to "c" are quite silly and confusing. I believe the existing code aligns
with the most similar code in kernel better:
	a. this_cpu_ptr()   //get everything for one cpu
	b. mutex_lock()
	  sg_init_one()
	  compress/decompress etc.
	  ...
	  mutex_unlock

2. while allocating mutex, we can put the mutex into local memory by using kmalloc_node().
If we move to "struct mutex lock" directly, most CPUs in a NUMA server will have to access
remote memory to read/write the mutex, therefore, this will increase the latency dramatically.

Thanks
Barry