Re: [RFC v3] external (pull) backup API

Eric Blake <eblake@xxxxxxxxxx> · Fri, 8 Jun 2018 16:40:09 -0500

On 05/17/2018 05:43 PM, Eric Blake wrote:
Here's my updated counterproposal for a backup API.

/**
  * virDomainBackupBegin:

  *
  * There are two fundamental backup approaches.  The first, called a
  * push model, instructs the hypervisor to copy the state of the guest
  * disk to the designated storage destination (which may be on the
  * local file system or a network device); in this mode, the
  * hypervisor writes the content of the guest disk to the destination,
  * then emits VIR_DOMAIN_EVENT_ID_BLOCK_JOB_2 when the backup is
  * either complete or failed (the backup image is invalid if the job
  * is ended prior to the event being emitted).

Better is VIR_DOMAIN_EVENT_ID_JOB_COMPLETED (BLOCK_JOB can only inform 
status about one disk, while this is intended to inform about multiple 
disks done in a single transaction).  I'm a bit depressed at our 
technical debt in this area: virDomainGetJobStats() and 
virDomainAbortJob() don't take a job id, but only operate on the most 
recently started job, but I did mention elsewhere in my plans:

I think that it should be possible to run multiple backup operations
in parallel in the long run.  But in the interest of getting a proof
of concept implementation out quickly, it's easier to state that for
the initial implementation, libvirt supports at most one backup
operation at a time (to do another backup, you have to wait for the
current one to complete, or else abort and abandon the current
one). As there is only one backup job running at a time, the existing
virDomainGetJobInfo()/virDomainGetJobStats() will be able to report
statistics about the job (insofar as such statistics are available).
But in preparation for the future, when libvirt does add parallel job
support, starting a backup job will return a job id; and presumably
we'd add a new virDomainGetJobStatsByID() for grabbing statistics of
an arbitrary (rather than the most-recently-started) job.

Since live migration also acts as a job visible through
virDomainGetJobStats(), I'm going to treat an active backup job and
live migration as mutually exclusive.  This is particularly true when
we have a pull model backup ongoing: if qemu on the source is acting
as an NBD server, you can't migrate away from that qemu and tell the
NBD client to reconnect to the NBD server on the migration
destination.  So, to perform a migration, you have to cancel any
pending backup operations.  Conversely, if a migration job is
underway, it will not be possible to start a new backup job until
migration completes.  However, we DO need to modify migration to
ensure that any persistent bitmaps are migrated. 

Yes, this means that virDomainBackupEnd() (which takes a job id) and 
virDomainJobAbort() (which does not, but until we support parallel 
backup jobs or a mix of backup and migration at once, it does not 
matter) can initially both do the work of aborting a backup job.

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list