On Fri, Oct 18, 2024 at 6:20 PM Kefeng Wang <wangkefeng.wang@xxxxxxxxxx> wrote: > > > > On 2024/10/17 23:09, Matthew Wilcox wrote: > > On Thu, Oct 17, 2024 at 10:25:04PM +0800, Kefeng Wang wrote: > >> Directly use folio_zero_range() to cleanup code. > > > > Are you sure there's no performance regression introduced by this? > > clear_highpage() is often optimised in ways that we can't optimise for > > a plain memset(). On the other hand, if the folio is large, maybe a > > modern CPU will be able to do better than clear-one-page-at-a-time. > > > > Right, I missing this, clear_page might be better than memset, I change > this one when look at the shmem_writepage(), which already convert to > use folio_zero_range() from clear_highpage(), also I grep > folio_zero_range(), there are some other to use folio_zero_range(). > > fs/bcachefs/fs-io-buffered.c: folio_zero_range(folio, 0, > folio_size(folio)); > fs/bcachefs/fs-io-buffered.c: folio_zero_range(f, 0, folio_size(f)); > fs/bcachefs/fs-io-buffered.c: folio_zero_range(f, 0, folio_size(f)); > fs/libfs.c: folio_zero_range(folio, 0, folio_size(folio)); > fs/ntfs3/frecord.c: folio_zero_range(folio, 0, folio_size(folio)); > mm/page_io.c: folio_zero_range(folio, 0, folio_size(folio)); > mm/shmem.c: folio_zero_range(folio, 0, folio_size(folio)); > > > > IOW, what performance testing have you done with this patch? > > No performance test before, but I write a testcase, > > 1) allocate N large folios (folio_alloc(PMD_ORDER)) > 2) then calculate the diff(us) when clear all N folios > clear_highpage/folio_zero_range/folio_zero_user > 3) release N folios > > the result(run 5 times) shown below on my machine, > > N=1, > clear_highpage folio_zero_range folio_zero_user > 1 69 74 177 > 2 57 62 168 > 3 54 58 234 > 4 54 58 157 > 5 56 62 148 > avg 58 62.8 176.8 > > > N=100 > clear_highpage folio_zero_range folio_zero_user > 1 11015 11309 32833 > 2 10385 11110 49751 > 3 10369 11056 33095 > 4 10332 11017 33106 > 5 10483 11000 49032 > avg 10516.8 11098.4 39563.4 > > N=512 > clear_highpage folio_zero_range folio_zero_user > 1 55560 60055 156876 > 2 55485 60024 157132 > 3 55474 60129 156658 > 4 55555 59867 157259 > 5 55528 59932 157108 > avg 55520.4 60001.4 157006.6 > > > > folio_zero_user with many cond_resched(), so time fluctuates a lot, > clear_highpage is better folio_zero_range as you said. > > Maybe add a new helper to convert all folio_zero_range(folio, 0, > folio_size(folio)) > to use clear_highpage + flush_dcache_folio? If this also improves performance for other existing callers of folio_zero_range(), then that's a positive outcome. > > > > > >> if (sgp != SGP_WRITE && !folio_test_uptodate(folio)) { > >> - long i, n = folio_nr_pages(folio); > >> - > >> - for (i = 0; i < n; i++) > >> - clear_highpage(folio_page(folio, i)); > >> - flush_dcache_folio(folio); > >> + folio_zero_range(folio, 0, folio_size(folio)); > >> folio_mark_uptodate(folio); > >> } > > > > Thanks Barry