Re: [RFC PATCH 7/7] mm: zswap: Use acomp virtual address interface

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Mar 05, 2025 at 11:40:58AM +0800, Herbert Xu wrote:
> On Tue, Mar 04, 2025 at 10:19:51PM +0900, Sergey Senozhatsky wrote:
> >
> > One thing to notice is that these functions don't actually map/unmap.
> 
> Fair enough.  I'll rename them to pin and unpin.
> 
> > And the handling is spread out over different parts of the stack,
> > sg list is set in zsmalloc, but the actual zsmalloc map local page is
> > done in crypto, and then zswap does memcpy() to write to object and so
> > on.  The "new" zsmalloc map API, which we plan on landing soon, handles
> > most of the things within zsmalloc.  Would it be possible to do something
> > similar with the sg API?
> 
> If by mapping you're referring to kmap then it's only being done
> in the Crypto API.  zswap is not doing any mappings with my patch,
> even the copy to SG list operation after compression calls Crypto
> API code (the newly introduced memcpy_to_sglist from crypto/scatterwalk.c.
> 
> The data is only ever read/written by Crypto API code so it
> would seem to be more natural to map it when and where the
> data is needed.
> 
> This also eliminates unnecessary mappings when the data is passed
> to hardware offload, since there is no point in mapping the data
> into CPU address space at all if it's only going to be accessed
> with DMA.
> 
> > > @@ -972,9 +973,9 @@ static bool zswap_compress(struct page *page, struct zswap_entry *entry,
> > >  	if (alloc_ret)
> > >  		goto unlock;
> > >  
> > > -	buf = zpool_map_handle(zpool, handle, ZPOOL_MM_WO);
> > > -	memcpy(buf, dst, dlen);
> > > -	zpool_unmap_handle(zpool, handle);
> > > +	zpool_map_sg(zpool, handle, ZPOOL_MM_WO, sg);
> > > +	memcpy_to_sglist(sg, 0, dst, dlen);
> > > +	zpool_unmap_sg(zpool, handle);
> > 
> > You can give zsmalloc a handle and a compressed buffer (u8) and
> > zsmalloc should be able to figure it out.  WO direction map()
> > seems, a bit, like an extra step.
> 
> Sure, this part can be dropped since your patch-set already provides
> an interface for writing to the buffer.  Here is the same patch rebased
> on top of your read_begin series.

Actually I just sent out a series that I had sitting in my local tree
for a bit to complete Sergey's work and completely remove the map/unmap
APIs:
https://lore.kernel.org/lkml/20250305061134.4105762-1-yosry.ahmed@xxxxxxxxx/.

The intention was always to have a single API for read/write or
map/unmap in zsmalloc. Sergey's series was already too long so I said I
will send a follow up one to switch zswap and complete the switch.

I am not objecting to switch the API to use SG lists if we intend to
switch multiple compression algorithms to use them and will completely
switch to using SG-based APIs in both zswap and zram. But I don't want
us to have two separate interfaces please.

Also, please take a look at patch 2 in this series for another reason, I
want to make sure if your virtual address series can be used to remove
the !virt_addr_valid() memcpy() case completely.

Thanks.

> 
> commit d5891a27df516192e381047b4c79de4e9f7df4cd
> Author: Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx>
> Date:   Thu Feb 27 18:10:32 2025 +0800
> 
>     mm: zswap: Give non-linear objects to Crypto API
>     
>     Instead of copying non-linear objects into a buffer, use the
>     scatterlist to give them directly to the Crypto API.
>     
>     Signed-off-by: Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx>
> 
> diff --git a/include/linux/zpool.h b/include/linux/zpool.h
> index a67d62b79698..795f8e3ad964 100644
> --- a/include/linux/zpool.h
> +++ b/include/linux/zpool.h
> @@ -12,27 +12,9 @@
>  #ifndef _ZPOOL_H_
>  #define _ZPOOL_H_
>  
> +struct scatterlist;
>  struct zpool;
>  
> -/*
> - * Control how a handle is mapped.  It will be ignored if the
> - * implementation does not support it.  Its use is optional.
> - * Note that this does not refer to memory protection, it
> - * refers to how the memory will be copied in/out if copying
> - * is necessary during mapping; read-write is the safest as
> - * it copies the existing memory in on map, and copies the
> - * changed memory back out on unmap.  Write-only does not copy
> - * in the memory and should only be used for initialization.
> - * If in doubt, use ZPOOL_MM_DEFAULT which is read-write.
> - */
> -enum zpool_mapmode {
> -	ZPOOL_MM_RW, /* normal read-write mapping */
> -	ZPOOL_MM_RO, /* read-only (no copy-out at unmap time) */
> -	ZPOOL_MM_WO, /* write-only (no copy-in at map time) */
> -
> -	ZPOOL_MM_DEFAULT = ZPOOL_MM_RW
> -};
> -
>  bool zpool_has_pool(char *type);
>  
>  struct zpool *zpool_create_pool(const char *type, const char *name, gfp_t gfp);
> @@ -48,10 +30,13 @@ int zpool_malloc(struct zpool *pool, size_t size, gfp_t gfp,
>  
>  void zpool_free(struct zpool *pool, unsigned long handle);
>  
> -void *zpool_map_handle(struct zpool *pool, unsigned long handle,
> -			enum zpool_mapmode mm);
> +void zpool_pin_handle(struct zpool *pool, unsigned long handle,
> +		      struct scatterlist *sg);
>  
> -void zpool_unmap_handle(struct zpool *pool, unsigned long handle);
> +void zpool_unpin_handle(struct zpool *pool, unsigned long handle);
> +
> +void zpool_write_handle(struct zpool *pool, unsigned long handle,
> +			void *handle_mem, size_t mem_len);
>  
>  u64 zpool_get_total_pages(struct zpool *pool);
>  
> @@ -64,9 +49,9 @@ u64 zpool_get_total_pages(struct zpool *pool);
>   * @destroy:	destroy a pool.
>   * @malloc:	allocate mem from a pool.
>   * @free:	free mem from a pool.
> - * @sleep_mapped: whether zpool driver can sleep during map.
> - * @map:	map a handle.
> - * @unmap:	unmap a handle.
> + * @pin:	pin a handle and write it into a two-entry SG list.
> + * @unpin:	unpin a handle.
> + * @write:	write buffer to a handle.
>   * @total_size:	get total size of a pool.
>   *
>   * This is created by a zpool implementation and registered
> @@ -86,10 +71,10 @@ struct zpool_driver {
>  				unsigned long *handle);
>  	void (*free)(void *pool, unsigned long handle);
>  
> -	bool sleep_mapped;
> -	void *(*map)(void *pool, unsigned long handle,
> -				enum zpool_mapmode mm);
> -	void (*unmap)(void *pool, unsigned long handle);
> +	void (*pin)(void *pool, unsigned long handle, struct scatterlist *sg);
> +	void (*unpin)(void *pool, unsigned long handle);
> +	void (*write)(void *pool, unsigned long handle,
> +				void *handle_mem, size_t mem_len);
>  
>  	u64 (*total_pages)(void *pool);
>  };
> @@ -98,6 +83,4 @@ void zpool_register_driver(struct zpool_driver *driver);
>  
>  int zpool_unregister_driver(struct zpool_driver *driver);
>  
> -bool zpool_can_sleep_mapped(struct zpool *pool);
> -
>  #endif
> diff --git a/include/linux/zsmalloc.h b/include/linux/zsmalloc.h
> index 7d70983cf398..c26baf9fb331 100644
> --- a/include/linux/zsmalloc.h
> +++ b/include/linux/zsmalloc.h
> @@ -16,23 +16,6 @@
>  
>  #include <linux/types.h>
>  
> -/*
> - * zsmalloc mapping modes
> - *
> - * NOTE: These only make a difference when a mapped object spans pages.
> - */
> -enum zs_mapmode {
> -	ZS_MM_RW, /* normal read-write mapping */
> -	ZS_MM_RO, /* read-only (no copy-out at unmap time) */
> -	ZS_MM_WO /* write-only (no copy-in at map time) */
> -	/*
> -	 * NOTE: ZS_MM_WO should only be used for initializing new
> -	 * (uninitialized) allocations.  Partial writes to already
> -	 * initialized allocations should use ZS_MM_RW to preserve the
> -	 * existing data.
> -	 */
> -};
> -
>  struct zs_pool_stats {
>  	/* How many pages were migrated (freed) */
>  	atomic_long_t pages_compacted;
> @@ -48,10 +31,6 @@ void zs_free(struct zs_pool *pool, unsigned long obj);
>  
>  size_t zs_huge_class_size(struct zs_pool *pool);
>  
> -void *zs_map_object(struct zs_pool *pool, unsigned long handle,
> -			enum zs_mapmode mm);
> -void zs_unmap_object(struct zs_pool *pool, unsigned long handle);
> -
>  unsigned long zs_get_total_pages(struct zs_pool *pool);
>  unsigned long zs_compact(struct zs_pool *pool);
>  
> diff --git a/mm/z3fold.c b/mm/z3fold.c
> index 379d24b4fef9..f0dc45cf9138 100644
> --- a/mm/z3fold.c
> +++ b/mm/z3fold.c
> @@ -36,6 +36,7 @@
>  #include <linux/percpu.h>
>  #include <linux/preempt.h>
>  #include <linux/workqueue.h>
> +#include <linux/scatterlist.h>
>  #include <linux/slab.h>
>  #include <linux/spinlock.h>
>  #include <linux/zpool.h>
> @@ -1392,16 +1393,28 @@ static void z3fold_zpool_free(void *pool, unsigned long handle)
>  	z3fold_free(pool, handle);
>  }
>  
> -static void *z3fold_zpool_map(void *pool, unsigned long handle,
> -			enum zpool_mapmode mm)
> +static void z3fold_zpool_pin(void *pool, unsigned long handle,
> +			     struct scatterlist sg[2])
>  {
> -	return z3fold_map(pool, handle);
> +	void *buf = z3fold_map(pool, handle);
> +
> +	sg_init_one(sg, buf, PAGE_SIZE - offset_in_page(buf));
>  }
> -static void z3fold_zpool_unmap(void *pool, unsigned long handle)
> +
> +static void z3fold_zpool_unpin(void *pool, unsigned long handle)
>  {
>  	z3fold_unmap(pool, handle);
>  }
>  
> +static void z3fold_zpool_write(void *pool, unsigned long handle,
> +			       void *handle_mem, size_t mem_len)
> +{
> +	void *buf = z3fold_map(pool, handle);
> +
> +	memcpy(buf, handle_mem, mem_len);
> +	z3fold_unmap(pool, handle);
> +}
> +
>  static u64 z3fold_zpool_total_pages(void *pool)
>  {
>  	return z3fold_get_pool_pages(pool);
> @@ -1409,14 +1422,14 @@ static u64 z3fold_zpool_total_pages(void *pool)
>  
>  static struct zpool_driver z3fold_zpool_driver = {
>  	.type =		"z3fold",
> -	.sleep_mapped = true,
>  	.owner =	THIS_MODULE,
>  	.create =	z3fold_zpool_create,
>  	.destroy =	z3fold_zpool_destroy,
>  	.malloc =	z3fold_zpool_malloc,
>  	.free =		z3fold_zpool_free,
> -	.map =		z3fold_zpool_map,
> -	.unmap =	z3fold_zpool_unmap,
> +	.pin =		z3fold_zpool_pin,
> +	.unpin =	z3fold_zpool_unpin,
> +	.write =	z3fold_zpool_write,
>  	.total_pages =	z3fold_zpool_total_pages,
>  };
>  
> diff --git a/mm/zbud.c b/mm/zbud.c
> index e9836fff9438..21c0a9c26abe 100644
> --- a/mm/zbud.c
> +++ b/mm/zbud.c
> @@ -36,10 +36,9 @@
>   *
>   * The zbud API differs from that of conventional allocators in that the
>   * allocation function, zbud_alloc(), returns an opaque handle to the user,
> - * not a dereferenceable pointer.  The user must map the handle using
> - * zbud_map() in order to get a usable pointer by which to access the
> - * allocation data and unmap the handle with zbud_unmap() when operations
> - * on the allocation data are complete.
> + * not a dereferenceable pointer.  The user must pin the handle using
> + * zbud_pin() in order to access the allocation data and unpin the handle
> + * with zbud_unpin() when operations on the allocation data are complete.
>   */
>  
>  #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> @@ -49,6 +48,7 @@
>  #include <linux/mm.h>
>  #include <linux/module.h>
>  #include <linux/preempt.h>
> +#include <linux/scatterlist.h>
>  #include <linux/slab.h>
>  #include <linux/spinlock.h>
>  #include <linux/zpool.h>
> @@ -339,28 +339,30 @@ static void zbud_free(struct zbud_pool *pool, unsigned long handle)
>  }
>  
>  /**
> - * zbud_map() - maps the allocation associated with the given handle
> + * zbud_pin() - pins the allocation associated with the given handle
>   * @pool:	pool in which the allocation resides
> - * @handle:	handle associated with the allocation to be mapped
> + * @handle:	handle associated with the allocation to be pinned
> + * @sg:		2-entry scatter list to store the memory pointers
>   *
> - * While trivial for zbud, the mapping functions for others allocators
> + * While trivial for zbud, the pinning functions for others allocators
>   * implementing this allocation API could have more complex information encoded
>   * in the handle and could create temporary mappings to make the data
>   * accessible to the user.
> - *
> - * Returns: a pointer to the mapped allocation
>   */
> -static void *zbud_map(struct zbud_pool *pool, unsigned long handle)
> +static void zbud_pin(struct zbud_pool *pool, unsigned long handle,
> +		      struct scatterlist sg[2])
>  {
> -	return (void *)(handle);
> +	void *buf = (void *)handle;
> +
> +	sg_init_one(sg, buf, PAGE_SIZE - offset_in_page(buf));
>  }
>  
>  /**
> - * zbud_unmap() - maps the allocation associated with the given handle
> + * zbud_unpin() - unpins the allocation associated with the given handle
>   * @pool:	pool in which the allocation resides
> - * @handle:	handle associated with the allocation to be unmapped
> + * @handle:	handle associated with the allocation to be unpinned
>   */
> -static void zbud_unmap(struct zbud_pool *pool, unsigned long handle)
> +static void zbud_unpin(struct zbud_pool *pool, unsigned long handle)
>  {
>  }
>  
> @@ -400,14 +402,20 @@ static void zbud_zpool_free(void *pool, unsigned long handle)
>  	zbud_free(pool, handle);
>  }
>  
> -static void *zbud_zpool_map(void *pool, unsigned long handle,
> -			enum zpool_mapmode mm)
> +static void zbud_zpool_pin(void *pool, unsigned long handle,
> +			   struct scatterlist sg[2])
>  {
> -	return zbud_map(pool, handle);
> +	zbud_pin(pool, handle, sg);
>  }
> -static void zbud_zpool_unmap(void *pool, unsigned long handle)
> +static void zbud_zpool_unpin(void *pool, unsigned long handle)
>  {
> -	zbud_unmap(pool, handle);
> +	zbud_unpin(pool, handle);
> +}
> +
> +static void zbud_zpool_write(void *pool, unsigned long handle,
> +			     void *handle_mem, size_t mem_len)
> +{
> +	memcpy((void *)handle, handle_mem, mem_len);
>  }
>  
>  static u64 zbud_zpool_total_pages(void *pool)
> @@ -417,14 +425,14 @@ static u64 zbud_zpool_total_pages(void *pool)
>  
>  static struct zpool_driver zbud_zpool_driver = {
>  	.type =		"zbud",
> -	.sleep_mapped = true,
>  	.owner =	THIS_MODULE,
>  	.create =	zbud_zpool_create,
>  	.destroy =	zbud_zpool_destroy,
>  	.malloc =	zbud_zpool_malloc,
>  	.free =		zbud_zpool_free,
> -	.map =		zbud_zpool_map,
> -	.unmap =	zbud_zpool_unmap,
> +	.pin =		zbud_zpool_pin,
> +	.unpin =	zbud_zpool_unpin,
> +	.write =	zbud_zpool_write,
>  	.total_pages =	zbud_zpool_total_pages,
>  };
>  
> diff --git a/mm/zpool.c b/mm/zpool.c
> index b9fda1fa857d..304639959b90 100644
> --- a/mm/zpool.c
> +++ b/mm/zpool.c
> @@ -13,6 +13,7 @@
>  #include <linux/list.h>
>  #include <linux/types.h>
>  #include <linux/mm.h>
> +#include <linux/scatterlist.h>
>  #include <linux/slab.h>
>  #include <linux/spinlock.h>
>  #include <linux/module.h>
> @@ -278,46 +279,53 @@ void zpool_free(struct zpool *zpool, unsigned long handle)
>  }
>  
>  /**
> - * zpool_map_handle() - Map a previously allocated handle into memory
> + * zpool_pin_handle() - Pin a previously allocated handle into memory
>   * @zpool:	The zpool that the handle was allocated from
> - * @handle:	The handle to map
> - * @mapmode:	How the memory should be mapped
> + * @handle:	The handle to pin
> + * @sg:		2-entry scatterlist to store pointers to the memmory
>   *
> - * This maps a previously allocated handle into memory.  The @mapmode
> - * param indicates to the implementation how the memory will be
> - * used, i.e. read-only, write-only, read-write.  If the
> - * implementation does not support it, the memory will be treated
> - * as read-write.
> + * This pins a previously allocated handle into memory.
>   *
>   * This may hold locks, disable interrupts, and/or preemption,
> - * and the zpool_unmap_handle() must be called to undo those
> - * actions.  The code that uses the mapped handle should complete
> - * its operations on the mapped handle memory quickly and unmap
> - * as soon as possible.  As the implementation may use per-cpu
> - * data, multiple handles should not be mapped concurrently on
> - * any cpu.
> - *
> - * Returns: A pointer to the handle's mapped memory area.
> + * and the zpool_unpin_handle() must be called to undo those
> + * actions.  The code that uses the pinned handle should complete
> + * its operations on the pinned handle memory quickly and unpin
> + * as soon as possible.
>   */
> -void *zpool_map_handle(struct zpool *zpool, unsigned long handle,
> -			enum zpool_mapmode mapmode)
> +void zpool_pin_handle(struct zpool *zpool, unsigned long handle,
> +		      struct scatterlist *sg)
>  {
> -	return zpool->driver->map(zpool->pool, handle, mapmode);
> +	zpool->driver->pin(zpool->pool, handle, sg);
>  }
>  
>  /**
> - * zpool_unmap_handle() - Unmap a previously mapped handle
> + * zpool_unpin_handle() - Unpin a previously pinned handle
>   * @zpool:	The zpool that the handle was allocated from
> - * @handle:	The handle to unmap
> + * @handle:	The handle to unpin
>   *
> - * This unmaps a previously mapped handle.  Any locks or other
> - * actions that the implementation took in zpool_map_handle()
> + * This unpins a previously pinned handle.  Any locks or other
> + * actions that the implementation took in zpool_pin_handle()
>   * will be undone here.  The memory area returned from
> - * zpool_map_handle() should no longer be used after this.
> + * zpool_pin_handle() should no longer be used after this.
>   */
> -void zpool_unmap_handle(struct zpool *zpool, unsigned long handle)
> +void zpool_unpin_handle(struct zpool *zpool, unsigned long handle)
>  {
> -	zpool->driver->unmap(zpool->pool, handle);
> +	zpool->driver->unpin(zpool->pool, handle);
> +}
> +
> +/**
> + * zpool_write_handle() - Write to a previously allocated handle
> + * @zpool:	The zpool that the handle was allocated from
> + * @handle:	The handle to write
> + * @handle_mem:	Data to write from
> + * @mem_len:	Length of data to be written
> + *
> + * This writes data to a previously allocated handle.
> + */
> +void zpool_write_handle(struct zpool *zpool, unsigned long handle,
> +			void *handle_mem, size_t mem_len)
> +{
> +	zpool->driver->write(zpool->pool, handle, handle_mem, mem_len);
>  }
>  
>  /**
> @@ -333,23 +341,5 @@ u64 zpool_get_total_pages(struct zpool *zpool)
>  	return zpool->driver->total_pages(zpool->pool);
>  }
>  
> -/**
> - * zpool_can_sleep_mapped - Test if zpool can sleep when do mapped.
> - * @zpool:	The zpool to test
> - *
> - * Some allocators enter non-preemptible context in ->map() callback (e.g.
> - * disable pagefaults) and exit that context in ->unmap(), which limits what
> - * we can do with the mapped object. For instance, we cannot wait for
> - * asynchronous crypto API to decompress such an object or take mutexes
> - * since those will call into the scheduler. This function tells us whether
> - * we use such an allocator.
> - *
> - * Returns: true if zpool can sleep; false otherwise.
> - */
> -bool zpool_can_sleep_mapped(struct zpool *zpool)
> -{
> -	return zpool->driver->sleep_mapped;
> -}
> -
>  MODULE_AUTHOR("Dan Streetman <ddstreet@xxxxxxxx>");
>  MODULE_DESCRIPTION("Common API for compressed memory storage");
> diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
> index 63c99db71dc1..934b3be467e6 100644
> --- a/mm/zsmalloc.c
> +++ b/mm/zsmalloc.c
> @@ -23,6 +23,7 @@
>   *	zspage->lock
>   */
>  
> +#include <crypto/scatterwalk.h>
>  #include <linux/module.h>
>  #include <linux/kernel.h>
>  #include <linux/sched.h>
> @@ -49,6 +50,7 @@
>  #include <linux/pagemap.h>
>  #include <linux/fs.h>
>  #include <linux/local_lock.h>
> +#include <linux/scatterlist.h>
>  #include "zpdesc.h"
>  
>  #define ZSPAGE_MAGIC	0x58
> @@ -281,13 +283,6 @@ struct zspage {
>  	struct zspage_lock zsl;
>  };
>  
> -struct mapping_area {
> -	local_lock_t lock;
> -	char *vm_buf; /* copy buffer for objects that span pages */
> -	char *vm_addr; /* address of kmap_local_page()'ed pages */
> -	enum zs_mapmode vm_mm; /* mapping mode */
> -};
> -
>  static void zspage_lock_init(struct zspage *zspage)
>  {
>  	static struct lock_class_key __key;
> @@ -453,6 +448,10 @@ static void record_obj(unsigned long handle, unsigned long obj)
>  
>  #ifdef CONFIG_ZPOOL
>  
> +static int zs_pin_object(struct zs_pool *pool, unsigned long handle,
> +			 struct scatterlist sg[2]);
> +static void zs_unpin_object(struct zs_pool *pool, unsigned long handle);
> +
>  static void *zs_zpool_create(const char *name, gfp_t gfp)
>  {
>  	/*
> @@ -482,29 +481,21 @@ static void zs_zpool_free(void *pool, unsigned long handle)
>  	zs_free(pool, handle);
>  }
>  
> -static void *zs_zpool_map(void *pool, unsigned long handle,
> -			enum zpool_mapmode mm)
> +static void zs_zpool_pin(void *pool, unsigned long handle,
> +			 struct scatterlist sg[2])
>  {
> -	enum zs_mapmode zs_mm;
> -
> -	switch (mm) {
> -	case ZPOOL_MM_RO:
> -		zs_mm = ZS_MM_RO;
> -		break;
> -	case ZPOOL_MM_WO:
> -		zs_mm = ZS_MM_WO;
> -		break;
> -	case ZPOOL_MM_RW:
> -	default:
> -		zs_mm = ZS_MM_RW;
> -		break;
> -	}
> -
> -	return zs_map_object(pool, handle, zs_mm);
> +	zs_pin_object(pool, handle, sg);
>  }
> -static void zs_zpool_unmap(void *pool, unsigned long handle)
> +
> +static void zs_zpool_unpin(void *pool, unsigned long handle)
>  {
> -	zs_unmap_object(pool, handle);
> +	zs_unpin_object(pool, handle);
> +}
> +
> +static void zs_zpool_write(void *pool, unsigned long handle,
> +			   void *handle_mem, size_t mem_len)
> +{
> +	zs_obj_write(pool, handle, handle_mem, mem_len);
>  }
>  
>  static u64 zs_zpool_total_pages(void *pool)
> @@ -520,19 +511,15 @@ static struct zpool_driver zs_zpool_driver = {
>  	.malloc_support_movable = true,
>  	.malloc =		  zs_zpool_malloc,
>  	.free =			  zs_zpool_free,
> -	.map =			  zs_zpool_map,
> -	.unmap =		  zs_zpool_unmap,
> +	.pin =			  zs_zpool_pin,
> +	.unpin =		  zs_zpool_unpin,
> +	.write =		  zs_zpool_write,
>  	.total_pages =		  zs_zpool_total_pages,
>  };
>  
>  MODULE_ALIAS("zpool-zsmalloc");
>  #endif /* CONFIG_ZPOOL */
>  
> -/* per-cpu VM mapping areas for zspage accesses that cross page boundaries */
> -static DEFINE_PER_CPU(struct mapping_area, zs_map_area) = {
> -	.lock	= INIT_LOCAL_LOCK(lock),
> -};
> -
>  static inline bool __maybe_unused is_first_zpdesc(struct zpdesc *zpdesc)
>  {
>  	return PagePrivate(zpdesc_page(zpdesc));
> @@ -1117,93 +1104,6 @@ static struct zspage *find_get_zspage(struct size_class *class)
>  	return zspage;
>  }
>  
> -static inline int __zs_cpu_up(struct mapping_area *area)
> -{
> -	/*
> -	 * Make sure we don't leak memory if a cpu UP notification
> -	 * and zs_init() race and both call zs_cpu_up() on the same cpu
> -	 */
> -	if (area->vm_buf)
> -		return 0;
> -	area->vm_buf = kmalloc(ZS_MAX_ALLOC_SIZE, GFP_KERNEL);
> -	if (!area->vm_buf)
> -		return -ENOMEM;
> -	return 0;
> -}
> -
> -static inline void __zs_cpu_down(struct mapping_area *area)
> -{
> -	kfree(area->vm_buf);
> -	area->vm_buf = NULL;
> -}
> -
> -static void *__zs_map_object(struct mapping_area *area,
> -			struct zpdesc *zpdescs[2], int off, int size)
> -{
> -	size_t sizes[2];
> -	char *buf = area->vm_buf;
> -
> -	/* disable page faults to match kmap_local_page() return conditions */
> -	pagefault_disable();
> -
> -	/* no read fastpath */
> -	if (area->vm_mm == ZS_MM_WO)
> -		goto out;
> -
> -	sizes[0] = PAGE_SIZE - off;
> -	sizes[1] = size - sizes[0];
> -
> -	/* copy object to per-cpu buffer */
> -	memcpy_from_page(buf, zpdesc_page(zpdescs[0]), off, sizes[0]);
> -	memcpy_from_page(buf + sizes[0], zpdesc_page(zpdescs[1]), 0, sizes[1]);
> -out:
> -	return area->vm_buf;
> -}
> -
> -static void __zs_unmap_object(struct mapping_area *area,
> -			struct zpdesc *zpdescs[2], int off, int size)
> -{
> -	size_t sizes[2];
> -	char *buf;
> -
> -	/* no write fastpath */
> -	if (area->vm_mm == ZS_MM_RO)
> -		goto out;
> -
> -	buf = area->vm_buf;
> -	buf = buf + ZS_HANDLE_SIZE;
> -	size -= ZS_HANDLE_SIZE;
> -	off += ZS_HANDLE_SIZE;
> -
> -	sizes[0] = PAGE_SIZE - off;
> -	sizes[1] = size - sizes[0];
> -
> -	/* copy per-cpu buffer to object */
> -	memcpy_to_page(zpdesc_page(zpdescs[0]), off, buf, sizes[0]);
> -	memcpy_to_page(zpdesc_page(zpdescs[1]), 0, buf + sizes[0], sizes[1]);
> -
> -out:
> -	/* enable page faults to match kunmap_local() return conditions */
> -	pagefault_enable();
> -}
> -
> -static int zs_cpu_prepare(unsigned int cpu)
> -{
> -	struct mapping_area *area;
> -
> -	area = &per_cpu(zs_map_area, cpu);
> -	return __zs_cpu_up(area);
> -}
> -
> -static int zs_cpu_dead(unsigned int cpu)
> -{
> -	struct mapping_area *area;
> -
> -	area = &per_cpu(zs_map_area, cpu);
> -	__zs_cpu_down(area);
> -	return 0;
> -}
> -
>  static bool can_merge(struct size_class *prev, int pages_per_zspage,
>  					int objs_per_zspage)
>  {
> @@ -1251,126 +1151,15 @@ unsigned long zs_get_total_pages(struct zs_pool *pool)
>  }
>  EXPORT_SYMBOL_GPL(zs_get_total_pages);
>  
> -/**
> - * zs_map_object - get address of allocated object from handle.
> - * @pool: pool from which the object was allocated
> - * @handle: handle returned from zs_malloc
> - * @mm: mapping mode to use
> - *
> - * Before using an object allocated from zs_malloc, it must be mapped using
> - * this function. When done with the object, it must be unmapped using
> - * zs_unmap_object.
> - *
> - * Only one object can be mapped per cpu at a time. There is no protection
> - * against nested mappings.
> - *
> - * This function returns with preemption and page faults disabled.
> - */
> -void *zs_map_object(struct zs_pool *pool, unsigned long handle,
> -			enum zs_mapmode mm)
> -{
> -	struct zspage *zspage;
> -	struct zpdesc *zpdesc;
> -	unsigned long obj, off;
> -	unsigned int obj_idx;
> -
> -	struct size_class *class;
> -	struct mapping_area *area;
> -	struct zpdesc *zpdescs[2];
> -	void *ret;
> -
> -	/*
> -	 * Because we use per-cpu mapping areas shared among the
> -	 * pools/users, we can't allow mapping in interrupt context
> -	 * because it can corrupt another users mappings.
> -	 */
> -	BUG_ON(in_interrupt());
> -
> -	/* It guarantees it can get zspage from handle safely */
> -	read_lock(&pool->lock);
> -	obj = handle_to_obj(handle);
> -	obj_to_location(obj, &zpdesc, &obj_idx);
> -	zspage = get_zspage(zpdesc);
> -
> -	/*
> -	 * migration cannot move any zpages in this zspage. Here, class->lock
> -	 * is too heavy since callers would take some time until they calls
> -	 * zs_unmap_object API so delegate the locking from class to zspage
> -	 * which is smaller granularity.
> -	 */
> -	zspage_read_lock(zspage);
> -	read_unlock(&pool->lock);
> -
> -	class = zspage_class(pool, zspage);
> -	off = offset_in_page(class->size * obj_idx);
> -
> -	local_lock(&zs_map_area.lock);
> -	area = this_cpu_ptr(&zs_map_area);
> -	area->vm_mm = mm;
> -	if (off + class->size <= PAGE_SIZE) {
> -		/* this object is contained entirely within a page */
> -		area->vm_addr = kmap_local_zpdesc(zpdesc);
> -		ret = area->vm_addr + off;
> -		goto out;
> -	}
> -
> -	/* this object spans two pages */
> -	zpdescs[0] = zpdesc;
> -	zpdescs[1] = get_next_zpdesc(zpdesc);
> -	BUG_ON(!zpdescs[1]);
> -
> -	ret = __zs_map_object(area, zpdescs, off, class->size);
> -out:
> -	if (likely(!ZsHugePage(zspage)))
> -		ret += ZS_HANDLE_SIZE;
> -
> -	return ret;
> -}
> -EXPORT_SYMBOL_GPL(zs_map_object);
> -
> -void zs_unmap_object(struct zs_pool *pool, unsigned long handle)
> -{
> -	struct zspage *zspage;
> -	struct zpdesc *zpdesc;
> -	unsigned long obj, off;
> -	unsigned int obj_idx;
> -
> -	struct size_class *class;
> -	struct mapping_area *area;
> -
> -	obj = handle_to_obj(handle);
> -	obj_to_location(obj, &zpdesc, &obj_idx);
> -	zspage = get_zspage(zpdesc);
> -	class = zspage_class(pool, zspage);
> -	off = offset_in_page(class->size * obj_idx);
> -
> -	area = this_cpu_ptr(&zs_map_area);
> -	if (off + class->size <= PAGE_SIZE)
> -		kunmap_local(area->vm_addr);
> -	else {
> -		struct zpdesc *zpdescs[2];
> -
> -		zpdescs[0] = zpdesc;
> -		zpdescs[1] = get_next_zpdesc(zpdesc);
> -		BUG_ON(!zpdescs[1]);
> -
> -		__zs_unmap_object(area, zpdescs, off, class->size);
> -	}
> -	local_unlock(&zs_map_area.lock);
> -
> -	zspage_read_unlock(zspage);
> -}
> -EXPORT_SYMBOL_GPL(zs_unmap_object);
> -
> -void *zs_obj_read_begin(struct zs_pool *pool, unsigned long handle,
> -			void *local_copy)
> +static int zs_pin_object(struct zs_pool *pool, unsigned long handle,
> +			 struct scatterlist sg[2])
>  {
> +	int handle_size = ZS_HANDLE_SIZE;
>  	struct zspage *zspage;
>  	struct zpdesc *zpdesc;
>  	unsigned long obj, off;
>  	unsigned int obj_idx;
>  	struct size_class *class;
> -	void *addr;
>  
>  	/* Guarantee we can get zspage from handle safely */
>  	read_lock(&pool->lock);
> @@ -1385,33 +1174,56 @@ void *zs_obj_read_begin(struct zs_pool *pool, unsigned long handle,
>  	class = zspage_class(pool, zspage);
>  	off = offset_in_page(class->size * obj_idx);
>  
> +	if (ZsHugePage(zspage))
> +		handle_size = 0;
> +
>  	if (off + class->size <= PAGE_SIZE) {
>  		/* this object is contained entirely within a page */
> -		addr = kmap_local_zpdesc(zpdesc);
> -		addr += off;
> +		sg_init_table(sg, 1);
> +		sg_set_page(sg, zpdesc_page(zpdesc),
> +			    class->size - handle_size, off + handle_size);
>  	} else {
>  		size_t sizes[2];
>  
>  		/* this object spans two pages */
>  		sizes[0] = PAGE_SIZE - off;
>  		sizes[1] = class->size - sizes[0];
> -		addr = local_copy;
>  
> -		memcpy_from_page(addr, zpdesc_page(zpdesc),
> -				 off, sizes[0]);
> +		sg_init_table(sg, 2);
> +		sg_set_page(sg, zpdesc_page(zpdesc), sizes[0] - handle_size,
> +			    off + handle_size);
>  		zpdesc = get_next_zpdesc(zpdesc);
> -		memcpy_from_page(addr + sizes[0],
> -				 zpdesc_page(zpdesc),
> -				 0, sizes[1]);
> +		sg_set_page(&sg[1], zpdesc_page(zpdesc), sizes[1], 0);
>  	}
>  
> -	if (!ZsHugePage(zspage))
> -		addr += ZS_HANDLE_SIZE;
> +	return class->size - handle_size;
> +}
> +
> +void *zs_obj_read_begin(struct zs_pool *pool, unsigned long handle,
> +			void *local_copy)
> +{
> +	struct scatterlist sg[2];
> +	void *addr;
> +	int len;
> +
> +	len = zs_pin_object(pool, handle, sg);
> +	if (sg_is_last(sg)) {
> +		addr = kmap_local_page(sg_page(sg));
> +		addr += sg[0].offset;;
> +	} else {
> +		addr = local_copy;
> +		memcpy_from_sglist(addr, sg, 0, len);
> +	}
>  
>  	return addr;
>  }
>  EXPORT_SYMBOL_GPL(zs_obj_read_begin);
>  
> +static void zs_unpin_object(struct zs_pool *pool, unsigned long handle)
> +{
> +	zs_obj_read_end(pool, handle, NULL);
> +}
> +
>  void zs_obj_read_end(struct zs_pool *pool, unsigned long handle,
>  		     void *handle_mem)
>  {
> @@ -1427,7 +1239,7 @@ void zs_obj_read_end(struct zs_pool *pool, unsigned long handle,
>  	class = zspage_class(pool, zspage);
>  	off = offset_in_page(class->size * obj_idx);
>  
> -	if (off + class->size <= PAGE_SIZE) {
> +	if (handle_mem && off + class->size <= PAGE_SIZE) {
>  		if (!ZsHugePage(zspage))
>  			off += ZS_HANDLE_SIZE;
>  		handle_mem -= off;
> @@ -1441,49 +1253,11 @@ EXPORT_SYMBOL_GPL(zs_obj_read_end);
>  void zs_obj_write(struct zs_pool *pool, unsigned long handle,
>  		  void *handle_mem, size_t mem_len)
>  {
> -	struct zspage *zspage;
> -	struct zpdesc *zpdesc;
> -	unsigned long obj, off;
> -	unsigned int obj_idx;
> -	struct size_class *class;
> +	struct scatterlist sg[2];
>  
> -	/* Guarantee we can get zspage from handle safely */
> -	read_lock(&pool->lock);
> -	obj = handle_to_obj(handle);
> -	obj_to_location(obj, &zpdesc, &obj_idx);
> -	zspage = get_zspage(zpdesc);
> -
> -	/* Make sure migration doesn't move any pages in this zspage */
> -	zspage_read_lock(zspage);
> -	read_unlock(&pool->lock);
> -
> -	class = zspage_class(pool, zspage);
> -	off = offset_in_page(class->size * obj_idx);
> -
> -	if (off + class->size <= PAGE_SIZE) {
> -		/* this object is contained entirely within a page */
> -		void *dst = kmap_local_zpdesc(zpdesc);
> -
> -		if (!ZsHugePage(zspage))
> -			off += ZS_HANDLE_SIZE;
> -		memcpy(dst + off, handle_mem, mem_len);
> -		kunmap_local(dst);
> -	} else {
> -		/* this object spans two pages */
> -		size_t sizes[2];
> -
> -		off += ZS_HANDLE_SIZE;
> -		sizes[0] = PAGE_SIZE - off;
> -		sizes[1] = mem_len - sizes[0];
> -
> -		memcpy_to_page(zpdesc_page(zpdesc), off,
> -			       handle_mem, sizes[0]);
> -		zpdesc = get_next_zpdesc(zpdesc);
> -		memcpy_to_page(zpdesc_page(zpdesc), 0,
> -			       handle_mem + sizes[0], sizes[1]);
> -	}
> -
> -	zspage_read_unlock(zspage);
> +	zs_pin_object(pool, handle, sg);
> +	memcpy_to_sglist(sg, 0, handle_mem, mem_len);
> +	zs_unpin_object(pool, handle);
>  }
>  EXPORT_SYMBOL_GPL(zs_obj_write);
>  
> @@ -2465,13 +2239,6 @@ EXPORT_SYMBOL_GPL(zs_destroy_pool);
>  
>  static int __init zs_init(void)
>  {
> -	int ret;
> -
> -	ret = cpuhp_setup_state(CPUHP_MM_ZS_PREPARE, "mm/zsmalloc:prepare",
> -				zs_cpu_prepare, zs_cpu_dead);
> -	if (ret)
> -		goto out;
> -
>  #ifdef CONFIG_ZPOOL
>  	zpool_register_driver(&zs_zpool_driver);
>  #endif
> @@ -2479,9 +2246,6 @@ static int __init zs_init(void)
>  	zs_stat_init();
>  
>  	return 0;
> -
> -out:
> -	return ret;
>  }
>  
>  static void __exit zs_exit(void)
> @@ -2489,7 +2253,6 @@ static void __exit zs_exit(void)
>  #ifdef CONFIG_ZPOOL
>  	zpool_unregister_driver(&zs_zpool_driver);
>  #endif
> -	cpuhp_remove_state(CPUHP_MM_ZS_PREPARE);
>  
>  	zs_stat_exit();
>  }
> diff --git a/mm/zswap.c b/mm/zswap.c
> index 6504174fbc6a..74252187d763 100644
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -147,7 +147,6 @@ struct crypto_acomp_ctx {
>  	struct crypto_wait wait;
>  	u8 *buffer;
>  	struct mutex mutex;
> -	bool is_sleepable;
>  };
>  
>  /*
> @@ -865,7 +864,6 @@ static int zswap_cpu_comp_prepare(unsigned int cpu, struct hlist_node *node)
>  
>  	acomp_ctx->buffer = buffer;
>  	acomp_ctx->acomp = acomp;
> -	acomp_ctx->is_sleepable = acomp_is_async(acomp);
>  	acomp_ctx->req = req;
>  	mutex_unlock(&acomp_ctx->mutex);
>  	return 0;
> @@ -930,7 +928,6 @@ static bool zswap_compress(struct page *page, struct zswap_entry *entry,
>  	unsigned int dlen = PAGE_SIZE;
>  	unsigned long handle;
>  	struct zpool *zpool;
> -	char *buf;
>  	gfp_t gfp;
>  	u8 *dst;
>  
> @@ -972,9 +969,7 @@ static bool zswap_compress(struct page *page, struct zswap_entry *entry,
>  	if (alloc_ret)
>  		goto unlock;
>  
> -	buf = zpool_map_handle(zpool, handle, ZPOOL_MM_WO);
> -	memcpy(buf, dst, dlen);
> -	zpool_unmap_handle(zpool, handle);
> +	zpool_write_handle(zpool, handle, dst, dlen);
>  
>  	entry->handle = handle;
>  	entry->length = dlen;
> @@ -994,37 +989,19 @@ static bool zswap_compress(struct page *page, struct zswap_entry *entry,
>  static void zswap_decompress(struct zswap_entry *entry, struct folio *folio)
>  {
>  	struct zpool *zpool = entry->pool->zpool;
> -	struct scatterlist input, output;
>  	struct crypto_acomp_ctx *acomp_ctx;
> -	u8 *src;
> +	struct scatterlist input[2];
> +	struct scatterlist output;
>  
>  	acomp_ctx = acomp_ctx_get_cpu_lock(entry->pool);
> -	src = zpool_map_handle(zpool, entry->handle, ZPOOL_MM_RO);
> -	/*
> -	 * If zpool_map_handle is atomic, we cannot reliably utilize its mapped buffer
> -	 * to do crypto_acomp_decompress() which might sleep. In such cases, we must
> -	 * resort to copying the buffer to a temporary one.
> -	 * Meanwhile, zpool_map_handle() might return a non-linearly mapped buffer,
> -	 * such as a kmap address of high memory or even ever a vmap address.
> -	 * However, sg_init_one is only equipped to handle linearly mapped low memory.
> -	 * In such cases, we also must copy the buffer to a temporary and lowmem one.
> -	 */
> -	if ((acomp_ctx->is_sleepable && !zpool_can_sleep_mapped(zpool)) ||
> -	    !virt_addr_valid(src)) {
> -		memcpy(acomp_ctx->buffer, src, entry->length);
> -		src = acomp_ctx->buffer;
> -		zpool_unmap_handle(zpool, entry->handle);
> -	}
> -
> -	sg_init_one(&input, src, entry->length);
> +	zpool_pin_handle(zpool, entry->handle, input);
>  	sg_init_table(&output, 1);
>  	sg_set_folio(&output, folio, PAGE_SIZE, 0);
> -	acomp_request_set_params(acomp_ctx->req, &input, &output, entry->length, PAGE_SIZE);
> +	acomp_request_set_params(acomp_ctx->req, input, &output, entry->length, PAGE_SIZE);
>  	BUG_ON(crypto_wait_req(crypto_acomp_decompress(acomp_ctx->req), &acomp_ctx->wait));
>  	BUG_ON(acomp_ctx->req->dlen != PAGE_SIZE);
>  
> -	if (src != acomp_ctx->buffer)
> -		zpool_unmap_handle(zpool, entry->handle);
> +	zpool_unpin_handle(zpool, entry->handle);
>  	acomp_ctx_put_unlock(acomp_ctx);
>  }
>  
> 
> Thanks,
> -- 
> Email: Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt




[Index of Archives]     [Kernel]     [Gnu Classpath]     [Gnu Crypto]     [DM Crypt]     [Netfilter]     [Bugtraq]
  Powered by Linux