On Sun, 2022-09-11 at 20:58 +0200, Isak wrote: > Hi everybody!!! > > I am very happy writing my first email to one of the Linux mailing list. > > I have read the faq and i know this mailing list is not a user help > desk but i have strange behaviour with memory write back and NFS. > Maybe someone can help me. I am so sorry if this is not the right > "forum". > > I did three simple tests writing to the same NFS filesystem and the > behavior of the cpu and memory is extruding my brain. > > The Environment: > > - Linux RedHat 8.6, 2 vCPU (VMWare VM) and 8 GB RAM (but same behavior > with Red Hat 7.9) > > - One nfs filesystem mounted with sync and without sync > > 1x.1x.2xx.1xx:/test_fs on /mnt/test_fs_with_sync type nfs > (rw,relatime,sync,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=1x.1x.2xx.1xx,mountvers=3,mountport=2050,mountproto=udp,local_lock=none,addr=1x.1x.2xx.1xx) > > 1x.1x.2xx.1xx:/test_fs on /mnt/test_fs_without_sync type nfs > (rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=1x.1x.2xx.1xx,mountvers=3,mountport=2050,mountproto=udp,local_lock=none,addr=1x.1x.2xx.1xx:) > > - Link between nfs client and nfs server is a 10Gb (Fiber) and iperf3 > data show the link works at maximum speed. No problems here. I know > there are nfs options like nconnect to improve performance but I am > interested in linux kernel internals. > > The test: > > 1.- dd in /mnt/test_fs_without_sync > > dd if=/dev/zero of=test.out bs=1M count=5000 > 5000+0 records in > 5000+0 records out > 5242880000 bytes (5.2 GB, 4.9 GiB) copied, 21.4122 s, 245 MB/s > > * High cpuwait > * High nfs latency > * Writeback in use > > Evidences: > https://zerobin.net/?43f9bea1953ed7aa#TaUk+K0GDhxjPq1EgJ2aAHgEyhntQ0NQzeFF51d9qI0= > > https://i.stack.imgur.com/pTong.png > > > > 2.- dd in /mnt/test_fs_with_sync > > dd if=/dev/zero of=test.out bs=1M count=5000 > 5000+0 records in > 5000+0 records out > 5242880000 bytes (5.2 GB, 4.9 GiB) copied, 35.6462 s, 147 MB/s > > * High cpuwait > * Low nfs latency > * No writeback > > Evidences > https://zerobin.net/?0ce52c5c5d946d7a#ZeyjHFIp7B+K+65DX2RzEGlp+Oq9rCidAKL8RpKpDJ8= > > https://i.stack.imgur.com/Pf1xS.png > > > > 3.- dd in /mnt/test_fs_with_sync and oflag=direct > > dd if=/dev/zero of=test.out bs=1M oflag=direct count=5000 > 5000+0 records in > 5000+0 records out > 5242880000 bytes (5.2 GB, 4.9 GiB) copied, 34.6491 s, 151 MB/s > > * Low cpuwait > * Low nfs latency > * No writeback > > Evidences: > https://zerobin.net/?03c4aa040a7a5323#bScEK36+Sdcz18VwKnBXNbOsi/qFt/O+qFyNj5FUs8k= > > https://i.stack.imgur.com/Qs6y5.png > > > > > The questions: > > I know write back is an old issue in linux and seems is the problem > here.I played with vm.dirty_background_bytes/vm.dirty_background_ratio > and vm.dirty_background_ratio/vm.dirty_background_ratio (i know only > one is valid) but whatever value put in this tunables I always have > iowait (except from dd with oflag=direct) > > - In test number 2. How is it possible that it has no nfs latency but > has a high cpu wait? > > - In test number 2. How is it possible that have almost the same code > path than test number 1? Test number 2 use a nfs filesystem mounted > with sync option but seems to use pagecache codepath (see flame graph) > "sync" just means that the write codepaths do an implicit fsync of the written range after every write. The data still goes through the pagecache in that case. It just does a (synchronous) flush of the data to the server and a commit after every 1M (in your case). > > - In test number 1. Why isn't there a change in cpuwait behavior when > vm.dirty tunables are changed? (i have tested a lot of combinations) > > Depends on which tunables you're twiddling, but you have 8G of RAM and are writing a 5G file. All of that should fit in the pagecache without needing to flush anything before all the writes are done. I imagine the vm.dirty tunables don't really come into play in these tests, other than maybe the background ones, and those shouldn't really affect your buffered write throughput. -- Jeff Layton <jlayton@xxxxxxxxxx>