Re: [RFC PATCH 5/6] zsmalloc: introduce handle mapping API

Yosry Ahmed <yosry.ahmed@xxxxxxxxx> · Tue, 28 Jan 2025 01:36:00 +0000

On Tue, Jan 28, 2025 at 09:59:55AM +0900, Sergey Senozhatsky wrote:
> On (25/01/27 21:58), Yosry Ahmed wrote:
> > On Mon, Jan 27, 2025 at 04:59:30PM +0900, Sergey Senozhatsky wrote:
> > > Introduce new API to map/unmap zsmalloc handle/object.  The key
> > > difference is that this API does not impose atomicity restrictions
> > > on its users, unlike zs_map_object() which returns with page-faults
> > > and preemption disabled - handle mapping API does not need a per-CPU
> > > vm-area because the users are required to provide an aux buffer for
> > > objects that span several physical pages.
> > 
> > I like the idea of supplying the buffer directly to zsmalloc, and zswap
> > already has per-CPU buffers allocated. This will help remove the special
> > case to handle not being able to sleep in zswap_decompress().
> 
> The interface, basically, is what we currently have, but the state
> is moved out of zsmalloc internal per-CPU vm-area.
> 
> > That being said, I am not a big fan of the new API for several reasons:
> > - The interface seems complicated, why do we need struct
> > zs_handle_mapping? Can't the user just pass an extra parameter to
> > zs_map_object/zs_unmap_object() to supply the buffer, and the return
> > value is the pointer to the data within the buffer?
> 
> At least now we need to save some state - e.g. direction of the map()
> so that during unmap zsmalloc determines if it needs to perform copy-out
> or not.  It also needs that state in order to know if the buffer needs
> to be unmapped.
> 
> zsmalloc MAP has two cases:
> a) the object spans several physical non-contig pages: copy-in object into
>   aux buffer and return (linear) pointer to that buffer
> b) the object is contained within a physical page: kmap that page and
>   return (linear) pointer to that mapping, unmap in zs_unmap_object().

Ack. See below.

> > - This seems to require an additional buffer on the compress side. Right
> > now, zswap compresses the page into its own buffer, maps the handle,
> > and copies to it. Now the map operation will require an extra buffer.
> 
> Yes, for (a) mentioned above.
> 
> > I guess in the WO case the buffer is not needed and we can just pass
> > NULL?
> 
> Yes.

Perhaps we want to document this and enforce it (make sure that the
NULL-ness of the buffer matches the access type).

> > Taking a step back, it actually seems to me that the mapping interface
> > may not be the best, at least from a zswap perspective. In both cases,
> > we map, copy from/to the handle, then unmap. The special casing here is
> > essentially handling the copy direction. Zram looks fairly similar but I
> > didn't look too closely.
> > 
> > I wonder if the API should store/load instead. You either pass a buffer
> > to be stored (equivalent to today's alloc + map + copy), or pass a
> > buffer to load into (equivalent to today's map + copy). What we really
> > need on the zswap side is zs_store() and zs_load(), not zs_map() with
> > different mapping types and an optional buffer if we are going to
> > eventually store. I guess that's part of a larger overhaul and we'd need
> > to update other zpool allocators (or remove them, z3fold should be
> > coming soon).
> 
> So I though about it: load and store.
> 
> zs_obj_load()
> {
> 	zspage->page kmap, etc.
> 	memcpy buf page   # if direction is not WO
> 	unmap
> }
> 
> zs_obj_store()
> {
> 	zspage->page kmap, etc.
> 	memcpy page buf   # if direction is not RO
> 	unmap
> }
> 
> load+store would not require zsmalloc to be preemptible internally, we
> could just keep existing atomic locks and it would make things a little
> simpler on the zram side (slot-free-notification is called from atomic
> section).
> 
> But, and it's a big but.  And it's (b) from the above.  I wasn't brave
> enough to just drop (b) optimization and replace it with memcpy(),
> especially when we work with relatively large objects (say size-class
> 3600 bytes and above).  This certainly would not make battery powered
> devices happier.  Maybe in zswap the page is only read once (is that
> correct?), but in zram page can be read multiple times (e.g. when zram
> is used as a raw block-dev, or has a mounted fs on it) which means
> multiple extra memcpy()-s.

In zswap, because we use the crypto_acomp API, when we cannot sleep with
the object mapped (which is true for zsmalloc), we just copy the
compressed object into a preallocated buffer anyway. So having a
zs_obj_load() interface would move that copy inside zsmalloc.

With your series, zswap can drop the memcpy and save some cycles on the
compress side. I didn't realize that zram does not perform any copies on the
read/decompress side.

Maybe the load interface can still provide a buffer to avoid the copy
where possible? I suspect with that we don't need the state and can
just pass a pointer. We'd need another call to potentially unmap, so
maybe load_start/load_end, or read_start/read_end.

Something like:

zs_obj_read_start(.., buf)
{
	if (contained in one page)
		return kmapped obj
	else
		memcpy to buf
		return buf
}

zs_obj_read_end(.., buf)
{
	if (container in one page)
		kunmap
}

The interface is more straightforward and we can drop the map flags
entirely, unless I missed something here. Unfortunately you'd still need
the locking changes in zsmalloc to make zram reads fully preemptible.

I am not suggesting that we have to go this way, just throwing out
ideas.

BTW, are we positive that the locking changes made in this series are
not introducing regressions? I'd hate for us to avoid an extra copy but
end up paying for it in lock contention.