Re: [PATCH 3/3] nfs: Block on write congestion

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 2024-06-13 at 10:28 +0200, Jan Kara wrote:
> Commit 6df25e58532b ("nfs: remove reliance on bdi congestion")
> introduced NFS-private solution for limiting number of writes
> outstanding against a particular server. Unlike previous bdi congestion
> this algorithm actually works and limits number of outstanding writeback
> pages to nfs_congestion_kb which scales with amount of client's memory
> and is capped at 256 MB. As a result some workloads such as random
> buffered writes over NFS got slower (from ~170 MB/s to ~126 MB/s). The
> fio command to reproduce is:
> 

That 256M cap was set back in 2007. I wonder if we ought to reconsider
that. It might be worth experimenting these days with a higher cap on
larger machines. Might that improve throughput even more?

> fio --direct=0 --ioengine=sync --thread --invalidate=1 --group_reporting=1
>   --runtime=300 --fallocate=posix --ramp_time=10 --new_group --rw=randwrite
>   --size=64256m --numjobs=4 --bs=4k --fsync_on_close=1 --end_fsync=1
> 
> This happens because the client sends ~256 MB worth of dirty pages to
> the server and any further background writeback request is ignored until
> the number of writeback pages gets below the threshold of 192 MB. By the
> time this happens and clients decides to trigger another round of
> writeback, the server often has no pages to write and the disk is idle.
> 
> To fix this problem and make the client react faster to eased congestion
> of the server by blocking waiting for congestion to resolve instead of
> aborting writeback. This improves the random 4k buffered write
> throughput to 184 MB/s.
> 
> Signed-off-by: Jan Kara <jack@xxxxxxx>
> ---
>  fs/nfs/client.c           |  1 +
>  fs/nfs/write.c            | 12 ++++++++----
>  include/linux/nfs_fs_sb.h |  1 +
>  3 files changed, 10 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/nfs/client.c b/fs/nfs/client.c
> index 3b252dceebf5..8286edd6062d 100644
> --- a/fs/nfs/client.c
> +++ b/fs/nfs/client.c
> @@ -994,6 +994,7 @@ struct nfs_server *nfs_alloc_server(void)
>  
>  	server->change_attr_type = NFS4_CHANGE_TYPE_IS_UNDEFINED;
>  
> +	init_waitqueue_head(&server->write_congestion_wait);
>  	atomic_long_set(&server->writeback, 0);
>  
>  	ida_init(&server->openowner_id);
> diff --git a/fs/nfs/write.c b/fs/nfs/write.c
> index c6255d7edd3c..21a5f1e90859 100644
> --- a/fs/nfs/write.c
> +++ b/fs/nfs/write.c
> @@ -423,8 +423,10 @@ static void nfs_folio_end_writeback(struct folio *folio)
>  
>  	folio_end_writeback(folio);
>  	if (atomic_long_dec_return(&nfss->writeback) <
> -	    NFS_CONGESTION_OFF_THRESH)
> +	    NFS_CONGESTION_OFF_THRESH) {
>  		nfss->write_congested = 0;
> +		wake_up_all(&nfss->write_congestion_wait);
> +	}
>  }
>  
>  static void nfs_page_end_writeback(struct nfs_page *req)
> @@ -698,12 +700,14 @@ int nfs_writepages(struct address_space *mapping, struct writeback_control *wbc)
>  	struct nfs_pageio_descriptor pgio;
>  	struct nfs_io_completion *ioc = NULL;
>  	unsigned int mntflags = NFS_SERVER(inode)->flags;
> +	struct nfs_server *nfss = NFS_SERVER(inode);
>  	int priority = 0;
>  	int err;
>  
> -	if (wbc->sync_mode == WB_SYNC_NONE &&
> -	    NFS_SERVER(inode)->write_congested)
> -		return 0;
> +	/* Wait with writeback until write congestion eases */
> +	if (wbc->sync_mode == WB_SYNC_NONE && nfss->write_congested)
> +		wait_event(nfss->write_congestion_wait,
> +			   nfss->write_congested == 0);
>  

It seems odd to block here, but if that helps throughput then so be it.

>  	nfs_inc_stats(inode, NFSIOS_VFSWRITEPAGES);
>  
> diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
> index 92de074e63b9..38a128796a76 100644
> --- a/include/linux/nfs_fs_sb.h
> +++ b/include/linux/nfs_fs_sb.h
> @@ -140,6 +140,7 @@ struct nfs_server {
>  	struct rpc_clnt *	client_acl;	/* ACL RPC client handle */
>  	struct nlm_host		*nlm_host;	/* NLM client handle */
>  	struct nfs_iostats __percpu *io_stats;	/* I/O statistics */
> +	wait_queue_head_t	write_congestion_wait;	/* wait until write congestion eases */
>  	atomic_long_t		writeback;	/* number of writeback pages */
>  	unsigned int		write_congested;/* flag set when writeback gets too high */
>  	unsigned int		flags;		/* various flags */

Reviewed-by: Jeff Layton <jlayton@xxxxxxxxxx>





[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux