On Mon, Aug 14, 2017 at 08:42:24 +0200, Dominik Psenner wrote: > Hi, Hi, > > a small update on this. We have migrated the virtualized host to use the > virtio drivers and now the drive performance is improved so that we can see > a constant transfer rate. Before it used to be the same rate but regularly > dropped to a few bytes/sec for a few seconds and then was fast again. > > However we still observe that the following fails regularily: > > $ virsh snapshot-create-as --domain domain --name backup --no-metadata > --atomic --disk-only --diskspec hda,snapshot=external > $ virsh blockcommit domain hda --active --pivot > error: failed to pivot job for disk hda > error: block copy still active: disk 'hda' not ready for pivot yet > Could not merge changes for disk hda of domain. VM may be in invalid state. since this thread was renamed, please re-state the version of libvirt you are using. I don't really want to dig through the old thread. > Then running the following in the morning succeeds and successfully pivotes > the snapshot into the base image while the vm is live: > > $ virsh blockjob domain hda --abort > $ virsh blockcommit domain hda --active --pivot > Successfully pivoted > > We run the backup process every day once and it failed on the following > days: > > 2017-07-07 > 2017-07-20 > 2017-07-27 > 2017-08-12 > 2017-08-14 > > Looking at this it roughly happens once a week and the guest from then on > writes into the snapshot backlog. That snapshot backlog file grows about > 8gb every day and thus the issue always needs immediate attention. > > Any ideas what could cause this issue? Is this a bug (race condition) of > `virsh blockcommit` that sometimes fails because it is invoked at the wrong > time? So the 'virsh blockcommit domain hda --active --pivot' operation consists of 3 parts: 1) virsh blockcommit domain hda --active 2) waiting until the block job finishes 3) virsh blockjob --pivot domain hda The problem is that some times 2) finishes too soon and then operation 3 fails. This should not happen any more, since there's code in virsh [1] which waits for the completion event from libvirtd, which is fired only when the job is actually ready to be pivoted. This code has a lot of fallback options in case when libvirtd is old or so. At any rate, manual pivoting later should help. Also probably updating to a more recent version. In case you are using a farily recent version, it's possible that there are still bugs though. Peter [1]: commit 7408403560f7d054da75acaab855a95c51a92e2b Author: Peter Krempa <pkrempa@xxxxxxxxxx> Date: Mon Jul 13 17:04:49 2015 +0200 virsh: Refactor block job waiting in cmdBlockCommit Reuse the vshBlockJobWait infrastructure to refactor cmdBlockCommit to use the common code. This additionally fixes a bug when working with new qemus, where when doing an active commit with --pivot the pivoting would fail, since qemu reaches 100% completion but the job doesn't switch to synchronized phase right away. $ git describe --contains 7408403560f7d054da75acaab855a95c51a92e2b v1.2.18-rc1~33
Attachment:
signature.asc
Description: PGP signature
_______________________________________________ libvirt-users mailing list libvirt-users@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvirt-users