Re: rsize,wsize=1M causes severe lags in 10/100 Mbps

Alkis Georgopoulos <alkisg@xxxxxxxxx> · Fri, 20 Sep 2019 12:48:04 +0300

On 9/20/19 12:25 PM, Alkis Georgopoulos wrote:
This is likely not the exact issue I'm experiencing, as I'm testing e.g.
with glibc 2.27-3ubuntu1 on Ubuntu 18.04 and kernel 5.0.

New benchmark, measuring the boot time of a netbooted client,
from right after the kernel is loaded to the display manager screen:

1) On 10 Mbps:
a) tcp,timeo=600,rsize=32K: 304 secs
b) tcp,timeo=600,rsize=1M: 618 secs

2) On 100 Mbps:
a) tcp,timeo=600,rsize=32K: 40 secs
b) tcp,timeo=600,rsize=1M: 84 secs

3) On 1000 Mbps:
a) tcp,timeo=600,rsize=32K: 20 secs
b) tcp,timeo=600,rsize=1M: 24 secs

32K is always faster, even on full gigabit.
Disk access on gigabit was *significantly* faster to result in 4 seconds 
lower boot time. In the 10/100 cases, rsize=1M is pretty much unusable.
There are no writes involved, they go in a local tmpfs/overlayfs.
Would it make sense for me to measure the *boot bandwidth* in each case, 
to see if more things (readahead) are downloaded with rsize=1M?

I did test the boot bandwidth.
On ext4-over-NFS, with tmpfs-and-overlayfs to make root writable:

2) On 100 Mbps:
a) tcp,timeo=600,rsize=32K: 471MB
b) tcp,timeo=600,rsize=1M: 1250MB

So it is indeed slower because it's transferring more things that the 
client doesn't need.
Maybe it is a different or a new aspect of the readahead issue that you 
guys mentioned above.
Is it possible that NFS is always sending 1MB chunks even when the 
actual data inside them is lower?

If you want me to test more things, I can;
if you consider it a problem with glibc etc that shouldn't involve this 
mailing list, I can try to report it there...

Thank you,
Alkis Georgopoulos