Hello, I often take snapshots in order to move kvm VMs from one nfs share to another while they're running or to take backups. Sometimes I have very large VMs (1.1 TB) which take a very long time (40 minutes - 2 hours) to backup or move. They also write between 20 - 60 GB of data while being backed up or moved. Once the backup or move is done the dirty snapshot data needs to be merged to the parent disk. While doing this I often experience I/O stalls within the VMs in the range of 1 - 20 seconds. Sometimes worse. But I have some very latency sensitive VMs which crash or misbehave after 15 seconds I/O stalls. So I would like to know if there is some tuening I can do to make these I/O stalls shorter. - I already tried to set vm.dirty_expire_centisecs=100 which appears to make it better, but not under 15 seconds. Perfect would be I/O stalls no more than 1 second. This is how you can reproduce the issue: - NFS Server: mkdir /ssd apt install -y nfs-kernel-server echo '/nfs 0.0.0.0/0.0.0.0(rw,no_root_squash,no_subtree_check,sync)' > /etc/exports exports -ra - NFS Client / KVM Host: mount server:/ssd /mnt # Put a VM on /mnt and start it. # Create a snapshot: virsh snapshot-create-as --domain testy guest-state1 --diskspec vda,file=/mnt/overlay.qcow2 --disk-only --atomic --no-metadata -no-metadata - In the VM: # Write some data (in my case 6 GB of data are writen in 60 seconds due # to the nfs client being connected with a 1 Gbit/s link) fio --ioengine=libaio --filesize=32G --ramp_time=2s --runtime=1m --numjobs=1 --direct=1 --verify=0 --randrepeat=0 --group_reporting --directory=/mnt --name=write --blocksize=1m --iodepth=1 --readwrite=write --unlink=1 # Do some synchronous I/O while true; do date | tee -a date.log; sync; sleep 1; done - On the NFS Client / KVM host: # Merge the snapshot into the parentdisk time virsh blockcommit testy vda --active --pivot --delete Successfully pivoted real 1m4.666s user 0m0.017s sys 0m0.007s I exported the nfs share with sync on purpose because I often use drbd in sync mode (protocol c) to replicate the data on the nfs server to a site which is 200 km away using a 10 Gbit/s link. The result is: (testy) [~] while true; do date | tee -a date.log; sync; sleep 1; done Sun May 5 12:53:36 CEST 2024 Sun May 5 12:53:37 CEST 2024 Sun May 5 12:53:38 CEST 2024 Sun May 5 12:53:39 CEST 2024 Sun May 5 12:53:40 CEST 2024 Sun May 5 12:53:41 CEST 2024 < here I started virsh blockcommit Sun May 5 12:53:45 CEST 2024 Sun May 5 12:53:50 CEST 2024 Sun May 5 12:53:59 CEST 2024 Sun May 5 12:54:04 CEST 2024 Sun May 5 12:54:22 CEST 2024 Sun May 5 12:54:23 CEST 2024 Sun May 5 12:54:27 CEST 2024 Sun May 5 12:54:32 CEST 2024 Sun May 5 12:54:40 CEST 2024 Sun May 5 12:54:42 CEST 2024 Sun May 5 12:54:45 CEST 2024 Sun May 5 12:54:46 CEST 2024 Sun May 5 12:54:47 CEST 2024 Sun May 5 12:54:48 CEST 2024 Sun May 5 12:54:49 CEST 2024 This is with 'vm.dirty_expire_centisecs=100' with the default values 'vm.dirty_expire_centisecs=3000' it is worse. I/O stalls: - 4 seconds - 9 seconds - 5 seconds - 18 seconds - 4 seconds - 5 seconds - 8 seconds - 2 seconds - 3 seconds With the default vm.dirty_expire_centisecs=3000 I get something like that: (testy) [~] while true; do date | tee -a date.log; sync; sleep 1; done Sun May 5 11:51:33 CEST 2024 Sun May 5 11:51:34 CEST 2024 Sun May 5 11:51:35 CEST 2024 Sun May 5 11:51:37 CEST 2024 Sun May 5 11:51:38 CEST 2024 Sun May 5 11:51:39 CEST 2024 Sun May 5 11:51:40 CEST 2024 << virsh blockcommit Sun May 5 11:51:49 CEST 2024 Sun May 5 11:52:07 CEST 2024 Sun May 5 11:52:08 CEST 2024 Sun May 5 11:52:27 CEST 2024 Sun May 5 11:52:45 CEST 2024 Sun May 5 11:52:47 CEST 2024 Sun May 5 11:52:48 CEST 2024 Sun May 5 11:52:49 CEST 2024 I/O stalls: - 9 seconds - 18 seconds - 19 seconds - 18 seconds - 1 seconds I'm open to any suggestions which improve the situation. I often have 10 Gbit/s network and a lot of dirty buffer cache, but at the same time I often replicate synchronously to a second site 200 kms apart which only gives me around 100 MB/s write performance. With vm.dirty_expire_centisecs=10 even worse: (testy) [~] while true; do date | tee -a date.log; sync; sleep 1; done Sun May 5 13:25:31 CEST 2024 Sun May 5 13:25:32 CEST 2024 Sun May 5 13:25:33 CEST 2024 Sun May 5 13:25:34 CEST 2024 Sun May 5 13:25:35 CEST 2024 Sun May 5 13:25:36 CEST 2024 Sun May 5 13:25:37 CEST 2024 < virsh blockcommit Sun May 5 13:26:00 CEST 2024 Sun May 5 13:26:01 CEST 2024 Sun May 5 13:26:06 CEST 2024 Sun May 5 13:26:11 CEST 2024 Sun May 5 13:26:40 CEST 2024 Sun May 5 13:26:42 CEST 2024 Sun May 5 13:26:43 CEST 2024 Sun May 5 13:26:44 CEST 2024 I/O stalls: - 23 seconds - 5 seconds - 5 seconds - 29 seconds - 1 second Cheers, Thomas