Hi, Rik, Thanks for comments! Rik van Riel <riel@xxxxxxxxxx> writes: > On Thu, 2016-08-25 at 12:27 -0700, Huang, Ying wrote: >> File pages use a set of radix tags (DIRTY, TOWRITE, WRITEBACK, etc.) >> to >> accelerate finding the pages with a specific tag in the radix tree >> during inode writeback. But for anonymous pages in the swap cache, >> there is no inode writeback. So there is no need to find the >> pages with some writeback tags in the radix tree. It is not >> necessary >> to touch radix tree writeback tags for pages in the swap cache. >> >> With this patch, the swap out bandwidth improved 22.3% (from ~1.2GB/s >> to >> ~ 1.48GBps) in the vm-scalability swap-w-seq test case with 8 >> processes. >> The test is done on a Xeon E5 v3 system. The swap device used is a >> RAM >> simulated PMEM (persistent memory) device. The improvement comes >> from >> the reduced contention on the swap cache radix tree lock. To test >> sequential swapping out, the test case uses 8 processes, which >> sequentially allocate and write to the anonymous pages until RAM and >> part of the swap device is used up. >> >> Details of comparison is as follow, >> >> base base+patch >> ---------------- -------------------------- >> %stddev %change %stddev >> \ | \ >> 1207402 ± 7% +22.3% 1476578 ± 6% vmstat.swap.so >> 2506952 ± 2% +28.1% 3212076 ± 7% vm- >> scalability.throughput >> 10.86 ± 12% -23.4% 8.31 ± 16% perf-profile.cycles- >> pp._raw_spin_lock_irq.__add_to_swap_cache.add_to_swap_cache.add_to_sw >> ap.shrink_page_list >> 10.82 ± 13% -33.1% 7.24 ± 14% perf-profile.cycles- >> pp._raw_spin_lock_irqsave.__remove_mapping.shrink_page_list.shrink_in >> active_list.shrink_zone_memcg >> 10.36 ± 11% -100.0% 0.00 ± -1% perf-profile.cycles- >> pp._raw_spin_lock_irqsave.__test_set_page_writeback.bdev_write_page._ >> _swap_writepage.swap_writepage >> 10.52 ± 12% -100.0% 0.00 ± -1% perf-profile.cycles- >> pp._raw_spin_lock_irqsave.test_clear_page_writeback.end_page_writebac >> k.page_endio.pmem_rw_page >> >> Cc: Hugh Dickins <hughd@xxxxxxxxxx> >> Cc: Shaohua Li <shli@xxxxxxxxxx> >> Cc: Minchan Kim <minchan@xxxxxxxxxx> >> Cc: Rik van Riel <riel@xxxxxxxxxx> >> Cc: Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> >> Cc: Tejun Heo <tj@xxxxxxxxxx> >> Cc: Wu Fengguang <fengguang.wu@xxxxxxxxx> >> Cc: Dave Hansen <dave.hansen@xxxxxxxxx> >> Signed-off-by: "Huang, Ying" <ying.huang@xxxxxxxxx> >> --- >> mm/page-writeback.c | 6 ++++-- >> 1 file changed, 4 insertions(+), 2 deletions(-) >> >> diff --git a/mm/page-writeback.c b/mm/page-writeback.c >> index 82e7252..599d2f9 100644 >> --- a/mm/page-writeback.c >> +++ b/mm/page-writeback.c >> @@ -2728,7 +2728,8 @@ int test_clear_page_writeback(struct page >> *page) >> int ret; >> >> lock_page_memcg(page); >> - if (mapping) { >> + /* Pages in swap cache don't use writeback tags */ >> + if (mapping && !PageSwapCache(page)) { > > I wonder if that should be a mapping_uses_tags(mapping) > macro or similar, and a per-mapping flag? > > I suspect there will be another case coming up soon > where we have a page cache radix tree, but no need > for dirty/writeback/... tags. > > That use case would be DAX filesystems, where we do > use a struct page, but that struct page points at > persistent storage, and the tags are not necessary. Asked Dan and Ross for DAX usage of writeback tags. The DAX uses these tags to flush the cache, etc. >From Dan: " DAX uses them to track PMEM ranges that have taken a write fault so that we can flush/write-back those dirty ranges at fsync()/msync() time. " But I still think that it may be a good idea to use some function or flags for this. Because it is more flexible and readable. Best Regards, Huang, Ying -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>