Re: [PATCH 14/20] xfs: use shmem_get_folio in xfile_obj_store

Hugh Dickins <hughd@xxxxxxxxxx> · Thu, 15 Feb 2024 23:40:41 -0800 (PST)

On Mon, 29 Jan 2024, Christoph Hellwig wrote:

> Switch to using shmem_get_folio and manually dirtying the page instead
> of abusing aops->write_begin and aops->write_end in xfile_get_page.
> 
> This simplifies the code by not doing indirect calls of not actually
> exported interfaces that don't really fit the use case very well, and
> happens to get us large folio support for free.
> 
> Signed-off-by: Christoph Hellwig <hch@xxxxxx>
> Reviewed-by: Darrick J. Wong <djwong@xxxxxxxxxx>
> ---
>  fs/xfs/scrub/xfile.c | 73 ++++++++++++++++----------------------------
>  1 file changed, 27 insertions(+), 46 deletions(-)
> 
> diff --git a/fs/xfs/scrub/xfile.c b/fs/xfs/scrub/xfile.c
> index a669ebbbc02d1d..2b4819902b4cc3 100644
> --- a/fs/xfs/scrub/xfile.c
> +++ b/fs/xfs/scrub/xfile.c
> @@ -183,11 +183,7 @@ xfile_store(
>  	loff_t			pos)
>  {
>  	struct inode		*inode = file_inode(xf->file);
> -	struct address_space	*mapping = inode->i_mapping;
> -	const struct address_space_operations *aops = mapping->a_ops;
> -	struct page		*page = NULL;
>  	unsigned int		pflags;
> -	int			error = 0;
>  
>  	if (count > MAX_RW_COUNT)
>  		return -ENOMEM;
> @@ -196,60 +192,45 @@ xfile_store(
>  
>  	trace_xfile_store(xf, pos, count);
>  
> +	/*
> +	 * Increase the file size first so that shmem_get_folio(..., SGP_CACHE),
> +	 * actually allocates a folio instead of erroring out.
> +	 */
> +	if (pos + count > i_size_read(inode))
> +		i_size_write(inode, pos + count);
> +
>  	pflags = memalloc_nofs_save();
>  	while (count > 0) {
> -		void		*fsdata = NULL;
> -		void		*p, *kaddr;
> +		struct folio	*folio;
>  		unsigned int	len;
> -		int		ret;
> +		unsigned int	offset;
>  
> -		len = min_t(ssize_t, count, PAGE_SIZE - offset_in_page(pos));
> -
> -		/*
> -		 * We call write_begin directly here to avoid all the freezer
> -		 * protection lock-taking that happens in the normal path.
> -		 * shmem doesn't support fs freeze, but lockdep doesn't know
> -		 * that and will trip over that.
> -		 */
> -		error = aops->write_begin(NULL, mapping, pos, len, &page,
> -				&fsdata);
> -		if (error) {
> -			error = -ENOMEM;
> +		if (shmem_get_folio(inode, pos >> PAGE_SHIFT, &folio,
> +				SGP_CACHE) < 0)o

SGP_CACHE is the safest choice, yes.  It will tend to do an unnecessary
clear_highpage() which you immediately overwrite with the actual data;
but saves calculating exactly what needs to be zeroed above and below
the data - not worth bothering with, unless shaving off CPU cycles.

>  			break;
> -		}
> -
> -		/*
> -		 * xfile pages must never be mapped into userspace, so we skip
> -		 * the dcache flush.  If the page is not uptodate, zero it
> -		 * before writing data.
> -		 */
> -		kaddr = page_address(page);
> -		if (!PageUptodate(page)) {
> -			memset(kaddr, 0, PAGE_SIZE);
> -			SetPageUptodate(page);
> -		}
> -		p = kaddr + offset_in_page(pos);
> -		memcpy(p, buf, len);
> -
> -		ret = aops->write_end(NULL, mapping, pos, len, len, page,
> -				fsdata);
> -		if (ret < 0) {
> -			error = -ENOMEM;
> +		if (filemap_check_wb_err(inode->i_mapping, 0)) {

Hah.  I was sceptical what that could ever achieve on shmem (the "wb"
is misleading); but it's an ingenious suggestion from Matthew, to avoid
our current horrid folio+page HWPoison handling.  I followed it up a bit,
and it does look as if this filemap_check_wb_err() technique should work
for that; but it's not something I've tried at all.

And that's all I've got to say on the series (I read on, but certainly
did not delve into the folio sorting stuff): looks good,  but the
VM_NORESERVE question probably needs attention (and that docbook
comment to mention "locked").

XFS tree stil seems to me the right home for it all.

Hugh

> +			folio_unlock(folio);
> +			folio_put(folio);
>  			break;
>  		}
>  
> -		if (ret != len) {
> -			error = -ENOMEM;
> -			break;
> -		}
> +		offset = offset_in_folio(folio, pos);
> +		len = min_t(ssize_t, count, folio_size(folio) - offset);
> +		memcpy(folio_address(folio) + offset, buf, len);
> +
> +		folio_mark_dirty(folio);
> +		folio_unlock(folio);
> +		folio_put(folio);
>  
> -		count -= ret;
> -		pos += ret;
> -		buf += ret;
> +		count -= len;
> +		pos += len;
> +		buf += len;
>  	}
>  	memalloc_nofs_restore(pflags);
>  
> -	return error;
> +	if (count)
> +		return -ENOMEM;
> +	return 0;
>  }
>  
>  /* Find the next written area in the xfile data for a given offset. */
> -- 
> 2.39.2
> 
>