Re: rsize,wsize=1M causes severe lags in 10/100 Mbps

Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> · Thu, 19 Sep 2019 20:05:55 +0000

On Thu, 2019-09-19 at 22:57 +0300, Alkis Georgopoulos wrote:
> On 9/19/19 10:51 PM, Trond Myklebust wrote:
> > I don't understand why klibc would default to supplying a timeo=7
> > argument at all. It would be MUCH better if it just let the kernel
> > set
> > the default, which in the case of TCP is timeo=600.
> > 
> > I agree with your argument that replaying requests every 0.7
> > seconds is
> > just going to cause congestion. TCP provides for reliable delivery
> > of
> > RPC messages to the server, which is why the kernel default is a
> > full
> > minute.
> > 
> > So please ask the klibc developers to change libmount to let the
> > kernel
> > decide the default mount options. Their current setting is just
> > plain
> > wrong.
> 
> This was what I asked in my first message to their mailing list,
> https://lists.zytor.com/archives/klibc/2019-September/004234.html
> 
> Then I realized that timeo=600 just hides the real problem,
> which is rsize=1M.
> 
> NFS defaults: timeo=600,rsize=1M => lag
> nfsmount defaults: timeo=7,rsize=1MK => lag AND dmesg errors
> 
> My proposal: timeo=whatever,rsize=32K => all fine
> 
> If more benchmarks are needed from me to document the
> "NFS defaults: timeo=600,rsize=1M => lag"
> I can surely provide them.

There are plenty of operations that can take longer than 700 ms to
complete. Synchronous writes to disk are one, but COMMIT (i.e. the NFS
equivalent of fsync()) can often take much longer even though it has no
payload.

So the problem is not the size of the WRITE payload. The real problem
is the timeout.

The bottom line is that if you want to keep timeo=7 as a mount option
for TCP, then you are on your own.

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@xxxxxxxxxxxxxxx