Re: NFS (pNFS) and VM dirty bytes

Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> · Mon, 3 Jun 2019 16:13:04 +0000

On Mon, 2019-06-03 at 17:07 +0200, Mkrtchyan, Tigran wrote:
> 
> Dear NFS fellows,
> 
> though this is not directly NFS issue, I post this question
> here as we mostly affected by NFS clients (and you have enough
> kernel connection to route it to the right people).
> 
> We have 25 new data processing nodes with 32 cores, 256 GB RAM and 25
> Gb/s NIC.
> They run CentOS 7 (but this is irrelevant, I think).
> 
> When each node runs 24 parallel write incentive (75% write, 25% read)
> workloads, we see a spike of
> IO errors on close. Client runs into timeout due to slow network or
> IO starvation on the NFS servers.
> It stumbles, disconnects, establishes a new connection and stumbled
> again...

You can adjust the pNFS timeout behaviour using the 'dataserver_timeo'
and 'dataserver_retrans' module parameters on both the files and
flexfiles pNFS driver modules.

> 
> As default values for dirty pages is
> 
> vm.dirty_background_bytes = 0
> vm.dirty_background_ratio = 10
> vm.dirty_bytes = 0
> vm.dirty_ratio = 30
> 
> the first data get sent when at least 25GB of data is accumulated.
> 
> To get the full deployment more responsive, we have reduced default
> numbers to something more reasonable:
> 
> vm.dirty_background_ratio = 0
> vm.dirty_ratio = 0
> vm.dirty_background_bytes = 67108864
> vm.dirty_bytes = 536870912
> 
> IOW, we force client to start to send data as soon as 64MB is
> written. The question is how get this
> values optimal and how make them file system/mount point specific.

The memory management system knows nothing about mount points, and the
filesystems know nothing about the memory management limits. That is by
design.

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@xxxxxxxxxxxxxxx