On Fri, Feb 28, 2025 at 05:44:20PM +0800, Yunsheng Lin wrote: > As mentioned in [1], it seems odd to check NULL elements in > the middle of page bulk allocating, and it seems caller can > do a better job of bulk allocating pages into a whole array > sequentially without checking NULL elements first before > doing the page bulk allocation for most of existing users. > > Through analyzing of bulk allocation API used in fs, it > seems that the callers are depending on the assumption of > populating only NULL elements in fs/btrfs/extent_io.c and > net/sunrpc/svc_xprt.c while erofs and btrfs don't, see: > commit 91d6ac1d62c3 ("btrfs: allocate page arrays using bulk page allocator") > commit d6db47e571dc ("erofs: do not use pagepool in z_erofs_gbuf_growsize()") > commit c9fa563072e1 ("xfs: use alloc_pages_bulk_array() for buffers") > commit f6e70aab9dfe ("SUNRPC: refresh rq_pages using a bulk page allocator") > > Change SUNRPC and btrfs to not depend on the assumption. > Other existing callers seems to be passing all NULL elements > via memset, kzalloc, etc. > > Remove assumption of populating only NULL elements and treat > page_array as output parameter like kmem_cache_alloc_bulk(). > Remove the above assumption also enable the caller to not > zero the array before calling the page bulk allocating API, > which has about 1~2 ns performance improvement for the test > case of time_bench_page_pool03_slow() for page_pool in a > x86 vm system, this reduces some performance impact of > fixing the DMA API misuse problem in [2], performance > improves from 87.886 ns to 86.429 ns. How much slower did you make btrfs and sunrpc by adding all the defragmenting code there? > > 1. https://lore.kernel.org/all/bd8c2f5c-464d-44ab-b607-390a87ea4cd5@xxxxxxxxxx/ > 2. https://lore.kernel.org/all/20250212092552.1779679-1-linyunsheng@xxxxxxxxxx/ > CC: Jesper Dangaard Brouer <hawk@xxxxxxxxxx> > CC: Luiz Capitulino <luizcap@xxxxxxxxxx> > CC: Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> > CC: Dave Chinner <david@xxxxxxxxxxxxx> > CC: Chuck Lever <chuck.lever@xxxxxxxxxx> > Signed-off-by: Yunsheng Lin <linyunsheng@xxxxxxxxxx> > Acked-by: Jeff Layton <jlayton@xxxxxxxxxx> > --- > V2: > 1. Drop RFC tag and rebased on latest linux-next. > 2. Fix a compile error for xfs. And you still haven't tested the code changes to XFS, because this patch is also broken. > diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c > index 5d560e9073f4..b4e95b2dd0f0 100644 > --- a/fs/xfs/xfs_buf.c > +++ b/fs/xfs/xfs_buf.c > @@ -319,16 +319,17 @@ xfs_buf_alloc_pages( > * least one extra page. > */ > for (;;) { > - long last = filled; > + long alloc; > > - filled = alloc_pages_bulk(gfp_mask, bp->b_page_count, > - bp->b_pages); > + alloc = alloc_pages_bulk(gfp_mask, bp->b_page_count - filled, > + bp->b_pages + filled); > + filled += alloc; > if (filled == bp->b_page_count) { > XFS_STATS_INC(bp->b_mount, xb_page_found); > break; > } > > - if (filled != last) > + if (alloc) > continue; alloc_pages_bulk() now returns the number of pages allocated in the array. So if we ask for 4 pages, then get 2, filled is now 2. Then we loop, ask for another 2 pages, get those two pages and it returns 4. Now filled is 6, and we continue. Now we ask alloc_pages_bulk() for -2 pages, which returns 4 pages... Worse behaviour: second time around, no page allocation succeeds so it returns 2 pages. Filled is now 4, which is the number of pages we need, so we break out of the loop with only 2 pages allocated. There's about to be kernel crashes occur..... Once is a mistake, twice is compeltely unacceptable. When XFS stops using alloc_pages_bulk (probably 6.15) I won't care anymore. But until then, please stop trying to change this code. NACK. -Dave. -- Dave Chinner david@xxxxxxxxxxxxx