On 2024/10/17 23:09, Matthew Wilcox wrote:
On Thu, Oct 17, 2024 at 10:25:04PM +0800, Kefeng Wang wrote:
Directly use folio_zero_range() to cleanup code.
Are you sure there's no performance regression introduced by this?
clear_highpage() is often optimised in ways that we can't optimise for
a plain memset(). On the other hand, if the folio is large, maybe a
modern CPU will be able to do better than clear-one-page-at-a-time.
Right, I missing this, clear_page might be better than memset, I change
this one when look at the shmem_writepage(), which already convert to
use folio_zero_range() from clear_highpage(), also I grep
folio_zero_range(), there are some other to use folio_zero_range().
fs/bcachefs/fs-io-buffered.c: folio_zero_range(folio, 0,
folio_size(folio));
fs/bcachefs/fs-io-buffered.c: folio_zero_range(f,
0, folio_size(f));
fs/bcachefs/fs-io-buffered.c: folio_zero_range(f,
0, folio_size(f));
fs/libfs.c: folio_zero_range(folio, 0, folio_size(folio));
fs/ntfs3/frecord.c: folio_zero_range(folio, 0,
folio_size(folio));
mm/page_io.c: folio_zero_range(folio, 0, folio_size(folio));
mm/shmem.c: folio_zero_range(folio, 0, folio_size(folio));
IOW, what performance testing have you done with this patch?
No performance test before, but I write a testcase,
1) allocate N large folios (folio_alloc(PMD_ORDER))
2) then calculate the diff(us) when clear all N folios
clear_highpage/folio_zero_range/folio_zero_user
3) release N folios
the result(run 5 times) shown below on my machine,
N=1,
clear_highpage folio_zero_range folio_zero_user
1 69 74 177
2 57 62 168
3 54 58 234
4 54 58 157
5 56 62 148
avg 58 62.8 176.8
N=100
clear_highpage folio_zero_range folio_zero_user
1 11015 11309 32833
2 10385 11110 49751
3 10369 11056 33095
4 10332 11017 33106
5 10483 11000 49032
avg 10516.8 11098.4 39563.4
N=512
clear_highpage folio_zero_range folio_zero_user
1 55560 60055 156876
2 55485 60024 157132
3 55474 60129 156658
4 55555 59867 157259
5 55528 59932 157108
avg 55520.4 60001.4 157006.6
folio_zero_user with many cond_resched(), so time fluctuates a lot,
clear_highpage is better folio_zero_range as you said.
Maybe add a new helper to convert all folio_zero_range(folio, 0,
folio_size(folio))
to use clear_highpage + flush_dcache_folio?