在 2023/10/24 15:21, zhiguojiang 写道:
在 2023/10/24 15:07, David Hildenbrand 写道:
On 24.10.23 04:04, zhiguojiang wrote:
在 2023/10/23 21:01, Matthew Wilcox 写道:
On Mon, Oct 23, 2023 at 08:44:55PM +0800, zhiguojiang wrote:
在 2023/10/23 20:21, Matthew Wilcox 写道:
On Mon, Oct 23, 2023 at 04:07:28PM +0800, zhiguojiang wrote:
Are you seeing measurable changes for any workloads? It
certainly seems
like you should, but it would help if you chose a test from
mmtests and
showed how performance changed on your system.
In one mmtest, the max times for a invalid recyling of a
folio_list dirty
folio that does not support pageout and has been activated in
shrink_folio_list() are: cost=51us, exe=2365us.
Calculate according to this formula: dirty_cost / total_cost *
100%, the
recyling efficiency of dirty folios can be improved 53.13%、82.95%.
So this patch can optimize shrink efficiency and reduce the
workload of
kswapd to a certain extent.
kswapd0-96 ( 96) [005] ..... 387.218548:
mm_vmscan_lru_shrink_inactive: [Justin] nid 0 nr_scanned 32
nr_taken 32
nr_reclaimed 31 nr_dirty 1 nr_unqueued_dirty 1 nr_writeback 0
nr_activate[1] 1 nr_ref_keep 0 f RECLAIM_WB_FILE|RECLAIM_WB_ASYNC
total_cost 96 total_exe 2365 dirty_cost 51 total_exe 2365
kswapd0-96 ( 96) [006] ..... 412.822532:
mm_vmscan_lru_shrink_inactive: [Justin] nid 0 nr_scanned 32
nr_taken 32
nr_reclaimed 0 nr_dirty 32 nr_unqueued_dirty 32 nr_writeback 0
nr_activate[1] 19 nr_ref_keep 13 f RECLAIM_WB_FILE|RECLAIM_WB_ASYNC
total_cost 88 total_exe 605 dirty_cost 73 total_exe 605
I appreciate that you can put probes in and determine the cost,
but do
you see improvements for a real workload? Like doing a kernel
compile
-- does it speed up at all?
Can you help share a method for testing thread workload, like kswapd?
Something dirt simple like 'time make -j8'.
Two compilations were conducted separately, and compared to the
unmodified compilation,
the compilation time for adding modified patches had a certain
reduction, as follows:
Compilation command:
make distclean -j8
make ARCH=x86_64 x86_64_defconfig
time make -j8
1.Unmodified Compilation time:
real 2m40.276s
user 16m2.956s
sys 2m14.738s
real 2m40.136s
user 16m2.617s
sys 2m14.722s
2.[Patch v2 1/2] Modified Compilation time:
real 2m40.067s
user 16m3.164s
sys 2m14.211s
real 2m40.123s
user 16m2.439s
sys 2m14.508s
3 [Patch v2 1/2] + [Patch v2 2/2] Modified Compilation time:
real 2m40.367s
user 16m3.738s
sys 2m13.662s
real 2m40.014s
user 16m3.108s
sys 2m14.096s
To get expressive numbers two iterations are usually not sufficient.
How much memory does you system have? Does vmscan even ever get active?
Test system memory: MemTotal: 8161608 kB. When multiple Apps were
opened, vmscan can get active. I can capture a lot of tracelog data
through testing, I only posted two sets of tracelog data.
Hi, please help to continue reviewing this path and draw a conclusion on
whether it can be merged. Thanks.