Re: [PATCHv4 0/7] zswap: compressed swap caching

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Seth,
On Tue, 2013-01-29 at 15:40 -0600, Seth Jennings wrote:
> Sorry for the churn but just this set might be easier to review.
> The code required for the flushing is in a separate patch now
> as requested.
> 
> Changelog:
> 
> v4:
> * Added Acks (Minchan)
> * Separated flushing functionality into standalone patch
>   for easier review (Minchan)
> * fix comment on zswap enabled attribute (Minchan)
> * add TODO for dynamic mempool size (Minchan)
> * and check for NULL in zswap_free_page() (Minchan)
> * add missing zs_free() in error path (Minchan)
> * TODO: add comments for flushing/refcounting (Minchan)
> 
> NOTE: To build, read this:
> http://lkml.org/lkml/2013/1/28/586
> 
> v3:
> * Dropped the zsmalloc patches from the set, except the promotion patch
>   which has be converted to a rename patch (vs full diff).  The dropped
>   patches have been Acked and are going into Greg's staging tree soon.
> * Separated [PATCHv2 7/9] into two patches since it makes changes for two
>   different reasons (Minchan)
> * Moved ZSWAP_MAX_OUTSTANDING_FLUSHES near the top in zswap.c (Rik)
> * Rebase to v3.8-rc5. linux-next is a little volatile with the
>   swapper_space per type changes which will effect this patchset.
> * TODO: Move some stats from debugfs to sysfs. Which ones? (Rik)
> 
> v2:
> * Rename zswap_fs_* functions to zswap_frontswap_* to avoid
>   confusion with "filesystem"
> * Add comment about what the tree lock protects
> * Remove "#if 0" code (should have been done before)
> * Break out changes to existing swap code into separate patch
> * Fix blank line EOF warning on documentation file
> * Rebase to next-20130107
> 
> Zswap Overview:
> 
> Zswap is a lightweight compressed cache for swap pages. It takes
> pages that are in the process of being swapped out and attempts to
> compress them into a dynamically allocated RAM-based memory pool.
> If this process is successful, the writeback to the swap device is
> deferred and, in many cases, avoided completely.  This results in
> a significant I/O reduction and performance gains for systems that
> are swapping.
> 
> The results of a kernel building benchmark indicate a
> runtime reduction of 53% and an I/O reduction 76% with zswap vs normal
> swapping with a kernel build under heavy memory pressure (see
> Performance section for more).
> 
> Some addition performance metrics regarding the performance
> improvements and I/O reductions that can be achieved using zswap as
> measured by SPECjbb are provided here:
> 
> http://ibm.co/VCgHvM
> 
> These results include runs on x86 and new results on Power7+ with
> hardware compression acceleration.
> 
> Of particular note is that zswap is able to evict pages from the compressed
> cache, on an LRU basis, to the backing swap device when the compressed pool
> reaches it size limit or the pool is unable to obtain additional pages
> from the buddy allocator.  This eviction functionality had been identified
> as a requirement in prior community discussions.
> 
> Patchset Structure:
> 1:   add atomic_t get/set to debugfs
> 2:   promote zsmalloc to /lib
> 3,4: changes to existing swap code for zswap
> 5,6: add zswap and documentation
> 
> Rationale:
> 
> Zswap provides compressed swap caching that basically trades CPU cycles
> for reduced swap I/O.  This trade-off can result in a significant
> performance improvement as reads to/writes from to the compressed
> cache almost always faster that reading from a swap device
> which incurs the latency of an asynchronous block I/O read.
> 
> Some potential benefits:
> * Desktop/laptop users with limited RAM capacities can mitigate the
>     performance impact of swapping.
> * Overcommitted guests that share a common I/O resource can
>     dramatically reduce their swap I/O pressure, avoiding heavy
>     handed I/O throttling by the hypervisor.  This allows more work
>     to get done with less impact to the guest workload and guests
>     sharing the I/O subsystem
> * Users with SSDs as swap devices can extend the life of the device by
>     drastically reducing life-shortening writes.
> 
> Compressed swap is also provided in zcache, along with page cache
> compression and RAM clustering through RAMSter. Zswap seeks to deliver
> the benefit of swap  compression to users in a discrete function.
> This design decision is akin to Unix design philosophy of doing one
> thing well, it leaves file cache compression and other features
> for separate code.
> 
> Design:
> 
> Zswap receives pages for compression through the Frontswap API and
> is able to evict pages from its own compressed pool on an LRU basis
> and write them back to the backing swap device in the case that the
> compressed pool is full or unable to secure additional pages from
> the buddy allocator.
> 
> Zswap makes use of zsmalloc for the managing the compressed memory
> pool.  This is because zsmalloc is specifically designed to minimize
> fragmentation on large (> PAGE_SIZE/2) allocation sizes.  Each
> allocation in zsmalloc is not directly accessible by address.
> Rather, a handle is return by the allocation routine and that handle
> must be mapped before being accessed.  The compressed memory pool grows
> on demand and shrinks as compressed pages are freed.  The pool is
> not preallocated.
> 
> When a swap page is passed from frontswap to zswap, zswap maintains
> a mapping of the swap entry, a combination of the swap type and swap
> offset, to the zsmalloc handle that references that compressed swap
> page.  This mapping is achieved with a red-black tree per swap type.
> The swap offset is the search key for the tree nodes.
> 
> Zswap seeks to be simple in its policies.  Sysfs attributes allow for
> two user controlled policies:
> * max_compression_ratio - Maximum compression ratio, as as percentage,
>     for an acceptable compressed page. Any page that does not compress
>     by at least this ratio will be rejected.
> * max_pool_percent - The maximum percentage of memory that the compressed
>     pool can occupy.
> 
> To enabled zswap, the "enabled" attribute must be set to 1 at boot time.
> 
> Zswap allows the compressor to be selected at kernel boot time by
> setting the “compressor” attribute.  The default compressor is lzo.
> 
> A debugfs interface is provided for various statistic about pool size,
> number of pages stored, and various counters for the reasons pages
> are rejected.
> 
> Performance, Kernel Building:
> 
> Setup
> ========
> Gentoo w/ kernel v3.7-rc7
> Quad-core i5-2500 @ 3.3GHz
> 512MB DDR3 1600MHz (limited with mem=512m on boot)
> Filesystem and swap on 80GB HDD (about 58MB/s with hdparm -t)
> majflt are major page faults reported by the time command
> pswpin/out is the delta of pswpin/out from /proc/vmstat before and after
> the make -jN
> 
> Summary
> ========
> * Zswap reduces I/O and improves performance at all swap pressure levels.
> 
> * Under heavy swaping at 24 threads, zswap reduced I/O by 76%, saving
>   over 1.5GB of I/O, and cut runtime in half.

How to get your benchmark?

> 
> Details
> ========
> I/O (in pages)
> 	base				zswap				change	change
> N	pswpin	pswpout	majflt	I/O sum	pswpin	pswpout	majflt	I/O sum	%I/O	MB
> 8	1	335	291	627	0	0	249	249	-60%	1
> 12	3688	14315	5290	23293	123	860	5954	6937	-70%	64
> 16	12711	46179	16803	75693	2936	7390	46092	56418	-25%	75
> 20	42178	133781	49898	225857	9460	28382	92951	130793	-42%	371
> 24	96079	357280	105242	558601	7719	18484	109309	135512	-76%	1653
> 
> Runtime (in seconds)
> N	base	zswap	%change
> 8	107	107	0%
> 12	128	110	-14%
> 16	191	179	-6%
> 20	371	240	-35%
> 24	570	267	-53%
> 
> %CPU utilization (out of 400% on 4 cpus)
> N	base	zswap	%change
> 8	317	319	1%
> 12	267	311	16%
> 16	179	191	7%
> 20	94	143	52%
> 24	60	128	113%
> 
> 
> Seth Jennings (7):
>   debugfs: add get/set for atomic types
>   zsmalloc: promote to lib/
>   zswap: add to mm/
>   mm: break up swap_writepage() for frontswap backends
>   mm: allow for outstanding swap writeback accounting
>   zswap: add flushing support
>   zswap: add documentation
> 
>  Documentation/vm/zswap.txt                         |   73 ++
>  drivers/staging/Kconfig                            |    2 -
>  drivers/staging/Makefile                           |    1 -
>  drivers/staging/zcache/zcache-main.c               |    3 +-
>  drivers/staging/zram/zram_drv.h                    |    3 +-
>  drivers/staging/zsmalloc/Kconfig                   |   10 -
>  drivers/staging/zsmalloc/Makefile                  |    3 -
>  fs/debugfs/file.c                                  |   42 +
>  include/linux/debugfs.h                            |    2 +
>  include/linux/swap.h                               |    4 +
>  .../staging/zsmalloc => include/linux}/zsmalloc.h  |    0
>  lib/Kconfig                                        |   18 +
>  lib/Makefile                                       |    1 +
>  .../zsmalloc/zsmalloc-main.c => lib/zsmalloc.c     |    3 +-
>  mm/Kconfig                                         |   15 +
>  mm/Makefile                                        |    1 +
>  mm/page_io.c                                       |   22 +-
>  mm/swap_state.c                                    |    2 +-
>  mm/zswap.c                                         | 1073 ++++++++++++++++++++
>  19 files changed, 1250 insertions(+), 28 deletions(-)
>  create mode 100644 Documentation/vm/zswap.txt
>  delete mode 100644 drivers/staging/zsmalloc/Kconfig
>  delete mode 100644 drivers/staging/zsmalloc/Makefile
>  rename {drivers/staging/zsmalloc => include/linux}/zsmalloc.h (100%)
>  rename drivers/staging/zsmalloc/zsmalloc-main.c => lib/zsmalloc.c (99%)
>  create mode 100644 mm/zswap.c
> 


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>


[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]