Re: [PATCH] mm: reuse the unshared swapcache page in do_wp_page

Matthew Wilcox <willy@xxxxxxxxxxxxx> · Thu, 13 Jan 2022 14:39:28 +0000




On Thu, Jan 13, 2022 at 10:03:18PM +0800, Liang Zhang wrote:
> In current implementation, process's read requestions will fault in pages
> with WP flags in PTEs. Next, if process emit a write requestion will go
> into do_wp_page() and copy data to a new allocated page from the old one
> due to refcount > 1 (page table mapped and swapcache), which could be
> result in performance degradation. In fact, this page is exclusively owned
> by this process and the duplication from old to a new allocated page is
> really unnecessary.
> 
> So In this situation, these unshared pages can be reused by its process.

Let's bring Linus in on this, but I think this reintroduces all of the
mapcount problems that we've been discussing recently.

How about this as an alternative?

+++ b/mm/memory.c
@@ -3291,11 +3291,11 @@ static vm_fault_t do_wp_page(struct vm_fault *vmf)
                struct page *page = vmf->page;

                /* PageKsm() doesn't necessarily raise the page refcount */
-               if (PageKsm(page) || page_count(page) != 1)
+               if (PageKsm(page) || page_count(page) != 1 + PageSwapCache(page))
                        goto copy;
                if (!trylock_page(page))
                        goto copy;
-               if (PageKsm(page) || page_mapcount(page) != 1 || page_count(page) != 1) {
+               if (PageKsm(page) || page_mapcount(page) != 1 || page_count(page) != 1 + PageSwapCache(page)) {
                        unlock_page(page);
                        goto copy;
                }


> Signed-off-by: Liang Zhang <zhangliang5@xxxxxxxxxx>
> ---
> This patch has been tested with redis benchmark. Here is the test
> result.
> 
> Hardware
> ========
> Memory (GB): 512G
> CPU (total #): 88
> NVMe SSD (GB): 1024
> 
> OS
> ==
> kernel 5.10.0
> 
> Testcase
> ========
> step 1:
>   Run 16 VMs (4U8G), each running with redis-server, in a cgroup 
>   limiting memory.limit_in_bytes to 100G. 
> step 2:
>   Run memtier_bemchmark in host with params "--threads=1 --clients=1 \
> --pipeline=256 --data-size=2048 --requests=allkeys --key-minimum=1 \
> --key-maximum=30000000 --key-prefix=memtier-benchmark-prefix-redistests"
>   to test every VM concurrently.
> 
> Workset size
> ============
> cat memory.memsw.usage_in_bytes
> 125403303936
> 
> Result
> ======
> Comparing with Baseline, this patch can achieved 41% more Ops/sec, 
> 41% more Hits/sec, 41% more Misses/sec, 30% less Latency and 
> 41% more KB/sec. 
> 
>   Index(average)        Baseline kernel        Patched kernel
>   Ops/sec               109497                 155428
>   Hits/sec              8653                   12283
>   Misses/sec            90889                  129014
>   Latency               2.297                  1.603
>   KB/sec                44569                  63186
> 
> 
>  mm/memory.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index 23f2f1300d42..fd4d868b1c2d 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3291,10 +3291,16 @@ static vm_fault_t do_wp_page(struct vm_fault *vmf)
>  		struct page *page = vmf->page;
>  
>  		/* PageKsm() doesn't necessarily raise the page refcount */
> -		if (PageKsm(page) || page_count(page) != 1)
> +		if (PageKsm(page))
>  			goto copy;
>  		if (!trylock_page(page))
>  			goto copy;
> +
> +		/* reuse the unshared swapcache page */
> +		if (PageSwapCache(page) && reuse_swap_page(page, NULL)) {
> +			goto reuse;
> +		}
> +
>  		if (PageKsm(page) || page_mapcount(page) != 1 || page_count(page) != 1) {
>  			unlock_page(page);
>  			goto copy;
> @@ -3304,6 +3310,7 @@ static vm_fault_t do_wp_page(struct vm_fault *vmf)
>  		 * page count reference, and the page is locked,
>  		 * it's dark out, and we're wearing sunglasses. Hit it.
>  		 */
> +reuse:
>  		unlock_page(page);
>  		wp_page_reuse(vmf);
>  		return VM_FAULT_WRITE;
> -- 
> 2.30.0
> 
>