On Wed, Mar 05, 2025 at 02:54:07PM -0800, Darrick J. Wong wrote: > On Thu, Mar 06, 2025 at 08:20:08AM +1100, Dave Chinner wrote: > > On Wed, Mar 05, 2025 at 07:05:27AM -0700, Christoph Hellwig wrote: > > > The fallback buffer allocation path currently open codes a suboptimal > > > version of vmalloc to allocate pages that are then mapped into > > > vmalloc space. Switch to using vmalloc instead, which uses all the > > > optimizations in the common vmalloc code, and removes the need to > > > track the backing pages in the xfs_buf structure. > > > > > > Signed-off-by: Christoph Hellwig <hch@xxxxxx> > > ..... > > > > > @@ -1500,29 +1373,43 @@ static void > > > xfs_buf_submit_bio( > > > struct xfs_buf *bp) > > > { > > > - unsigned int size = BBTOB(bp->b_length); > > > - unsigned int map = 0, p; > > > + unsigned int map = 0; > > > struct blk_plug plug; > > > struct bio *bio; > > > > > > - bio = bio_alloc(bp->b_target->bt_bdev, bp->b_page_count, > > > - xfs_buf_bio_op(bp), GFP_NOIO); > > > - bio->bi_private = bp; > > > - bio->bi_end_io = xfs_buf_bio_end_io; > > > + if (is_vmalloc_addr(bp->b_addr)) { > > > + unsigned int size = BBTOB(bp->b_length); > > > + unsigned int alloc_size = roundup(size, PAGE_SIZE); > > > + void *data = bp->b_addr; > > > > > > - if (bp->b_page_count == 1) { > > > - __bio_add_page(bio, virt_to_page(bp->b_addr), size, > > > - offset_in_page(bp->b_addr)); > > > - } else { > > > - for (p = 0; p < bp->b_page_count; p++) > > > - __bio_add_page(bio, bp->b_pages[p], PAGE_SIZE, 0); > > > - bio->bi_iter.bi_size = size; /* limit to the actual size used */ > > > + bio = bio_alloc(bp->b_target->bt_bdev, alloc_size >> PAGE_SHIFT, > > > + xfs_buf_bio_op(bp), GFP_NOIO); > > > + > > > + do { > > > + unsigned int len = min(size, PAGE_SIZE); > > > > > > - if (is_vmalloc_addr(bp->b_addr)) > > > - flush_kernel_vmap_range(bp->b_addr, > > > - xfs_buf_vmap_len(bp)); > > > + ASSERT(offset_in_page(data) == 0); > > > + __bio_add_page(bio, vmalloc_to_page(data), len, 0); > > > + data += len; > > > + size -= len; > > > + } while (size); > > > + > > > + flush_kernel_vmap_range(bp->b_addr, alloc_size); > > > + } else { > > > + /* > > > + * Single folio or slab allocation. Must be contiguous and thus > > > + * only a single bvec is needed. > > > + */ > > > + bio = bio_alloc(bp->b_target->bt_bdev, 1, xfs_buf_bio_op(bp), > > > + GFP_NOIO); > > > + __bio_add_page(bio, virt_to_page(bp->b_addr), > > > + BBTOB(bp->b_length), > > > + offset_in_page(bp->b_addr)); > > > } > > > > How does offset_in_page() work with a high order folio? It can only > > return a value between 0 and (PAGE_SIZE - 1). i.e. shouldn't this > > be: > > > > folio = kmem_to_folio(bp->b_addr); > > > > bio_add_folio_nofail(bio, folio, BBTOB(bp->b_length), > > offset_in_folio(folio, bp->b_addr)); > > I think offset_in_folio() returns 0 in the !kmem && !vmalloc case > because we allocate the folio and set b_addr to folio_address(folio); > and we never call the kmem alloc code for sizes greater than PAGE_SIZE. Yes, but that misses my point: this is a folio conversion, whilst this treats a folio as a page. We're trying to get rid of this sort of page/folio type confusion (i.e. stuff like "does offset_in_page() work correctly on large folios"). New code shouldn't be adding new issues like these, especially when there are existing folio-based APIs that are guaranteed to work correctly and won't need fixing in future before pages and folios can be fully separated. -Dave. -- Dave Chinner david@xxxxxxxxxxxxx