nfs client strange behavior with cpuwait and memory writeback

Isak <netamego@xxxxxxxxx> · Sun, 11 Sep 2022 20:58:32 +0200

Hi everybody!!!

I am very happy writing my first email to one of the Linux mailing list.

I have read the faq and i know this mailing list is not a user help
desk but i have strange behaviour with memory write back and NFS.
Maybe someone can help me. I am so sorry if this is not the right
"forum".

I did three simple tests writing to the same NFS filesystem and the
behavior of the cpu and memory is extruding my brain.

The Environment:

- Linux RedHat 8.6, 2 vCPU (VMWare VM) and 8 GB RAM (but same behavior
with Red Hat 7.9)

- One nfs filesystem mounted with sync and without sync

1x.1x.2xx.1xx:/test_fs on /mnt/test_fs_with_sync type nfs
(rw,relatime,sync,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=1x.1x.2xx.1xx,mountvers=3,mountport=2050,mountproto=udp,local_lock=none,addr=1x.1x.2xx.1xx)

1x.1x.2xx.1xx:/test_fs on /mnt/test_fs_without_sync type nfs
(rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=1x.1x.2xx.1xx,mountvers=3,mountport=2050,mountproto=udp,local_lock=none,addr=1x.1x.2xx.1xx:)

- Link between nfs client and nfs server is a 10Gb (Fiber) and iperf3
data show the link works at maximum speed. No problems here. I know
there are nfs options like nconnect to improve performance but I am
interested in linux kernel internals.

The test:

1.- dd in /mnt/test_fs_without_sync

dd if=/dev/zero of=test.out bs=1M count=5000
5000+0 records in
5000+0 records out
5242880000 bytes (5.2 GB, 4.9 GiB) copied, 21.4122 s, 245 MB/s

* High cpuwait
* High nfs latency
* Writeback in use

Evidences:
https://zerobin.net/?43f9bea1953ed7aa#TaUk+K0GDhxjPq1EgJ2aAHgEyhntQ0NQzeFF51d9qI0=

https://i.stack.imgur.com/pTong.png

2.- dd in /mnt/test_fs_with_sync

dd if=/dev/zero of=test.out bs=1M count=5000
5000+0 records in
5000+0 records out
5242880000 bytes (5.2 GB, 4.9 GiB) copied, 35.6462 s, 147 MB/s

* High cpuwait
* Low nfs latency
* No writeback

Evidences
https://zerobin.net/?0ce52c5c5d946d7a#ZeyjHFIp7B+K+65DX2RzEGlp+Oq9rCidAKL8RpKpDJ8=

https://i.stack.imgur.com/Pf1xS.png

3.- dd in /mnt/test_fs_with_sync and oflag=direct

dd if=/dev/zero of=test.out bs=1M oflag=direct count=5000
5000+0 records in
5000+0 records out
5242880000 bytes (5.2 GB, 4.9 GiB) copied, 34.6491 s, 151 MB/s

* Low cpuwait
* Low nfs latency
* No writeback

Evidences:
https://zerobin.net/?03c4aa040a7a5323#bScEK36+Sdcz18VwKnBXNbOsi/qFt/O+qFyNj5FUs8k=

https://i.stack.imgur.com/Qs6y5.png

The questions:

I know write back is an old issue in linux and seems is the problem
here.I played with vm.dirty_background_bytes/vm.dirty_background_ratio
and vm.dirty_background_ratio/vm.dirty_background_ratio (i know only
one is valid) but whatever value put in this tunables I always have
iowait (except from dd with oflag=direct)

- In test number 2. How is it possible that it has no nfs latency but
has a high cpu wait?

- In test number 2. How is it possible that have almost the same code
path than test number 1? Test number 2 use a nfs filesystem mounted
with sync option but seems to use pagecache codepath (see flame graph)

- In test number 1. Why isn't there a change in cpuwait behavior when
vm.dirty tunables are changed? (i have tested a lot of combinations)

Thank you very much!!

Best regards.