Re: [RFC PATCH 7/7] mm: zswap: Use acomp virtual address interface

Yosry Ahmed <yosry.ahmed@xxxxxxxxx> · Tue, 4 Mar 2025 20:47:48 +0000

On Tue, Mar 04, 2025 at 10:19:51PM +0900, Sergey Senozhatsky wrote:
> On (25/03/04 14:10), Herbert Xu wrote:
> > +static void zs_map_object_sg(struct zs_pool *pool, unsigned long handle,
> > +			     enum zs_mapmode mm, struct scatterlist sg[2])
> > +{
> [..]
> > +	sg_init_table(sg, 2);
> > +	sg_set_page(sg, zpdesc_page(zpdescs[0]),
> > +		    PAGE_SIZE - off - handle_size, off + handle_size);
> > +	sg_set_page(&sg[1], zpdesc_page(zpdescs[1]),
> > +		    class->size - (PAGE_SIZE - off - handle_size), 0);
> > +}
> > +
> > +static void zs_unmap_object_sg(struct zs_pool *pool, unsigned long handle)
> > +{
> > +	struct zspage *zspage;
> > +	struct zpdesc *zpdesc;
> > +	unsigned int obj_idx;
> > +	unsigned long obj;
> > +
> > +	obj = handle_to_obj(handle);
> > +	obj_to_location(obj, &zpdesc, &obj_idx);
> > +	zspage = get_zspage(zpdesc);
> > +	migrate_read_unlock(zspage);
> > +}
> 
> One thing to notice is that these functions don't actually map/unmap.
> 
> And the handling is spread out over different parts of the stack,
> sg list is set in zsmalloc, but the actual zsmalloc map local page is
> done in crypto, and then zswap does memcpy() to write to object and so
> on.  The "new" zsmalloc map API, which we plan on landing soon, handles
> most of the things within zsmalloc.  Would it be possible to do something
> similar with the sg API?

Yeah I have the same feeling that the handling is all over the place.
Also, we don't want to introduce new map APIs, so anything we do for
zswap should ideally work for zram.

We need to agree on the APIs between zsmalloc <-> zswap/zcomp <->
crypto.

In the compression path, zswap currently passes in the page to the
crypto API to get it compressed, and then allocates an object in
zsmalloc and memcpy() the compressed page to it. Then, zsmalloc may
internally memcpy() again.

These two copies should become one once zswap starts using
zs_obj_write(), and I think this is the bare minimum because we cannot
allocate the object before we are done with the compression, so we need
at least one copy.

In the decompression path, zswap gets the compressed object from
zsmalloc, which will memcpy() to a buffer if it spans two pages. Zswap
will memcpy() again if needed (highmem / not sleepable). Then we pass
it to the crypto API. I am not sure if we do extra copies internally,
but probably not.

The not sleepable case will disappear with the new zsmalloc API as well,
so the only copy in the zswap code will be if we use highmem. This goes
away if the crypto API can deal with highmem addresses, or if we use
kmap_to_page() in the zswap code.

IIUC, what Herbert is suggesting is that we rework all of this to use SG
lists to reduce copies, but I am not sure which copies can go away? We
have one copy in the compression path that probably cannot go away.
After the zsmalloc changes (and ignoring highmem), we have one copy in
the decompression path for when objects span two pages. I think this
will still happen with SG lists, except internally in the crypto API.

So I am not sure what is the advantage of using SG lists here? The only
improvement that we can make is to eliminate the copy in the highmem
case, but I think we don't really need SG lists for this.

Am I missing something?