Re: [PATCH 8/9] xfs: use vmalloc for multi-folio buffers

Dave Chinner <david@xxxxxxxxxxxxx> · Wed, 20 Mar 2024 11:20:06 +1100

On Tue, Mar 19, 2024 at 10:48:19AM -0700, Darrick J. Wong wrote:
> On Tue, Mar 19, 2024 at 09:45:59AM +1100, Dave Chinner wrote:
> > From: Christoph Hellwig <hch@xxxxxx>
> > 
> > Instead of allocating the folios manually using the bulk page
> > allocator and then using vm_map_page just use vmalloc to allocate
> > the entire buffer - vmalloc will use the bulk allocator internally
> > if it fits.
> > 
> > With this the b_folios array can go away as well as nothing uses it.
> > 
> > [dchinner: port to folio based buffers.]
> > 
> > Signed-off-by: Christoph Hellwig <hch@xxxxxx>
> > Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
> > ---
> >  fs/xfs/xfs_buf.c     | 164 ++++++++++++-------------------------------
> >  fs/xfs/xfs_buf.h     |   2 -
> >  fs/xfs/xfs_buf_mem.c |   9 +--
> >  3 files changed, 45 insertions(+), 130 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> > index 303945554415..6d6bad80722e 100644
> > --- a/fs/xfs/xfs_buf.c
> > +++ b/fs/xfs/xfs_buf.c
> > @@ -282,29 +282,6 @@ _xfs_buf_alloc(
> >  	return 0;
> >  }
> >  
> > -static void
> > -xfs_buf_free_folios(
> > -	struct xfs_buf	*bp)
> > -{
> > -	uint		i;
> > -
> > -	ASSERT(bp->b_flags & _XBF_FOLIOS);
> > -
> > -	if (xfs_buf_is_vmapped(bp))
> > -		vm_unmap_ram(bp->b_addr, bp->b_folio_count);
> > -
> > -	for (i = 0; i < bp->b_folio_count; i++) {
> > -		if (bp->b_folios[i])
> > -			__folio_put(bp->b_folios[i]);
> > -	}
> > -	mm_account_reclaimed_pages(bp->b_folio_count);
> > -
> > -	if (bp->b_folios != bp->b_folio_array)
> > -		kfree(bp->b_folios);
> > -	bp->b_folios = NULL;
> > -	bp->b_flags &= ~_XBF_FOLIOS;
> > -}
> > -
> >  static void
> >  xfs_buf_free_callback(
> >  	struct callback_head	*cb)
> > @@ -323,13 +300,22 @@ xfs_buf_free(
> >  
> >  	ASSERT(list_empty(&bp->b_lru));
> >  
> > -	if (xfs_buftarg_is_mem(bp->b_target))
> > +	if (xfs_buftarg_is_mem(bp->b_target)) {
> >  		xmbuf_unmap_folio(bp);
> > -	else if (bp->b_flags & _XBF_FOLIOS)
> > -		xfs_buf_free_folios(bp);
> > -	else if (bp->b_flags & _XBF_KMEM)
> > -		kfree(bp->b_addr);
> > +		goto free;
> > +	}
> >  
> > +	if (!(bp->b_flags & _XBF_KMEM))
> > +		mm_account_reclaimed_pages(bp->b_folio_count);
> 
> Echoing hch's statement about the argument being passed to
> mm_account_reclaimed_pages needing to be fed units of base pages, not
> folios.
> 
> > +
> > +	if (bp->b_flags & _XBF_FOLIOS)
> > +		__folio_put(kmem_to_folio(bp->b_addr));
> 
> Is it necessary to use folio_put instead of the __ version like hch said
> earlier?

Both fixed.

> 
> > +	else
> > +		kvfree(bp->b_addr);
> > +
> > +	bp->b_flags &= _XBF_KMEM | _XBF_FOLIOS;
> 
> Shouldn't this be:
> 
> 	bp->b_flags &= ~(_XBF_KMEM | _XBF_FOLIOS); ?

Yes. Good catch.

> > @@ -377,14 +361,15 @@ xfs_buf_alloc_folio(
> >  	struct xfs_buf	*bp,
> >  	gfp_t		gfp_mask)
> >  {
> > +	struct folio	*folio;
> >  	int		length = BBTOB(bp->b_length);
> >  	int		order = get_order(length);
> >  
> > -	bp->b_folio_array[0] = folio_alloc(gfp_mask, order);
> > -	if (!bp->b_folio_array[0])
> > +	folio = folio_alloc(gfp_mask, order);
> > +	if (!folio)
> >  		return false;
> >  
> > -	bp->b_folios = bp->b_folio_array;
> > +	bp->b_addr = folio_address(folio);
> >  	bp->b_folio_count = 1;
> >  	bp->b_flags |= _XBF_FOLIOS;
> >  	return true;
> > @@ -400,15 +385,11 @@ xfs_buf_alloc_folio(
> >   * contiguous memory region that we don't have to map and unmap to access the
> >   * data directly.
> >   *
> > - * The second type of buffer is the multi-folio buffer. These are *always* made
> > - * up of single page folios so that they can be fed to vmap_ram() to return a
> > - * contiguous memory region we can access the data through.
> > - *
> > - * We don't use high order folios for this second type of buffer (yet) because
> > - * having variable size folios makes offset-to-folio indexing and iteration of
> > - * the data range more complex than if they are fixed size. This case should now
> > - * be the slow path, though, so unless we regularly fail to allocate high order
> > - * folios, there should be little need to optimise this path.
> > + * The second type of buffer is the vmalloc()d buffer. This provides the buffer
> > + * with the required contiguous memory region but backed by discontiguous
> > + * physical pages. vmalloc() typically doesn't fail, but it can and so we may
> > + * need to wrap the allocation in a loop to prevent low memory failures and
> > + * shutdowns.
> 
> Where's the loop now?  Is that buried under __vmalloc somewhere?

I thought I'd added __GFP_NOFAIL to the __vmalloc() gfp mask to make
it loop. I suspect I lost it at some point when rebasing either this
or the (now merged) kmem.[ch] removal patchset.

Well spotted, I'll fix that up.

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx