Re: [PATCH v2 1/2] mm:vmscan: the dirty folio in folio_list skip unmap

zhiguojiang <justinjiang@xxxxxxxx> · Wed, 25 Oct 2023 23:37:53 +0800

在 2023/10/24 15:21, zhiguojiang 写道:

在 2023/10/24 15:07, David Hildenbrand 写道:
On 24.10.23 04:04, zhiguojiang wrote:

在 2023/10/23 21:01, Matthew Wilcox 写道:
On Mon, Oct 23, 2023 at 08:44:55PM +0800, zhiguojiang wrote:
在 2023/10/23 20:21, Matthew Wilcox 写道:
On Mon, Oct 23, 2023 at 04:07:28PM +0800, zhiguojiang wrote:
Are you seeing measurable changes for any workloads?  It 
certainly seems
like you should, but it would help if you chose a test from 
mmtests and
showed how performance changed on your system.
In one mmtest, the max times for a invalid recyling of a 
folio_list dirty
folio that does not support pageout and has been activated in
shrink_folio_list() are: cost=51us, exe=2365us.

Calculate according to this formula: dirty_cost / total_cost * 
100%, the
recyling efficiency of dirty folios can be improved 53.13%、82.95%.

So this patch can optimize shrink efficiency and reduce the 
workload of
kswapd to a certain extent.

kswapd0-96      (     96) [005] .....   387.218548:
mm_vmscan_lru_shrink_inactive: [Justin] nid 0 nr_scanned 32 
nr_taken 32
nr_reclaimed 31 nr_dirty  1 nr_unqueued_dirty  1 nr_writeback 0
nr_activate[1]  1 nr_ref_keep  0 f RECLAIM_WB_FILE|RECLAIM_WB_ASYNC
total_cost 96 total_exe 2365 dirty_cost 51 total_exe 2365

kswapd0-96      (     96) [006] .....   412.822532:
mm_vmscan_lru_shrink_inactive: [Justin] nid 0 nr_scanned 32 
nr_taken 32
nr_reclaimed  0 nr_dirty 32 nr_unqueued_dirty 32 nr_writeback 0
nr_activate[1] 19 nr_ref_keep 13 f RECLAIM_WB_FILE|RECLAIM_WB_ASYNC
total_cost 88 total_exe 605  dirty_cost 73 total_exe 605
I appreciate that you can put probes in and determine the cost, 
but do
you see improvements for a real workload?  Like doing a kernel 
compile
-- does it speed up at all?
Can you help share a method for testing thread workload, like kswapd?
Something dirt simple like 'time make -j8'.
Two compilations were conducted separately, and compared to the
unmodified compilation,
the compilation time for adding modified patches had a certain
reduction, as follows:

Compilation command:
make distclean -j8
make ARCH=x86_64 x86_64_defconfig
time make -j8

1.Unmodified Compilation time:
real    2m40.276s
user    16m2.956s
sys     2m14.738s

real    2m40.136s
user    16m2.617s
sys     2m14.722s

2.[Patch v2 1/2] Modified Compilation time:
real    2m40.067s
user    16m3.164s
sys     2m14.211s

real    2m40.123s
user    16m2.439s
sys     2m14.508s

3 [Patch v2 1/2] + [Patch v2 2/2] Modified Compilation time:
real    2m40.367s
user    16m3.738s
sys     2m13.662s

real    2m40.014s
user    16m3.108s
sys     2m14.096s

To get expressive numbers two iterations are usually not sufficient. 
How much memory does you system have? Does vmscan even ever get active?
Test system memory:  MemTotal:    8161608 kB.  When multiple Apps were 
opened, vmscan can get active. I can capture a lot of tracelog data 
through testing, I only posted two sets of tracelog data.
Hi, please help to continue reviewing this path and draw a conclusion on 
whether it can be merged. Thanks.