Re: [Qemu-devel] [RFC] COLO HA Project proposal

Hongyang Yang <yanghy@xxxxxxxxxxxxxx> · Thu, 3 Jul 2014 11:42:43 +0800

Hi David,

On 07/01/2014 08:12 PM, Dr. David Alan Gilbert wrote:
* Hongyang Yang (yanghy@xxxxxxxxxxxxxx) wrote:

Hi Yang,

Background:
   COLO HA project is a high availability solution. Both primary
VM (PVM) and secondary VM (SVM) run in parallel. They receive the
same request from client, and generate response in parallel too.
If the response packets from PVM and SVM are identical, they are
released immediately. Otherwise, a VM checkpoint (on demand) is
conducted. The idea is presented in Xen summit 2012, and 2013,
and academia paper in SOCC 2013. It's also presented in KVM forum
2013:
http://www.linux-kvm.org/wiki/images/1/1d/Kvm-forum-2013-COLO.pdf
Please refer to above document for detailed information.

Yes, I remember that talk - very interesting.

I didn't quite understand a couple of things though, perhaps you
can explain:
   1) If we ignore the TCP sequence number problem, in an SMP machine
don't we get other randomnesses - e.g. which core completes something
first, or who wins a lock contention, so the output stream might not
be identical - so do those normal bits of randomness cause the machines
to flag as out-of-sync?

It's about COLO agent, CCing Congyang, he can give the detailed
explanation.

   2) If the PVM has decided that the SVM is out of sync (due to 1) and
the PVM fails at about the same point - can we switch over to the SVM?

Yes, we can switch over, we have some mechanisms to ensure the SVM's state
is consentient:
- memory cache.
  The memory cache was initially the same as PVM's memory. At
checkpoint, we cache the dirty memory of PVM while transporting the
memory, write cached memory to SVM when we received all PVM memory
(we only need to write memory that was both dirty on PVM and SVM
from last checkpoint). This solves problem 2) you've mentioned above:
If PVM fails while checkpointing, SVM will discard the cached memory
and continue to run and to provide service just as it is.

- COLO Disk manager
  Like memory cache, COLO Disk manager caches the Disk modifications
of PVM, and write it to SVM Disk when checkpointing. If PVM fails while
checkpointing, SVM will discard the cached Disk modifications.

I'm worried that due to (1) there are periods where the system
is out-of-sync and a failure of the PVM is not protected.  Does that happen?
If so how often?

The attached was the architecture of kvm-COLO we proposed.
   - COLO Manager: Requires modifications of qemu
     - COLO Controller
         COLO Controller includes modifications of save/restore
       flow just like MC(macrocheckpoint), a memory cache on
       secondary VM which cache the dirty pages of primary VM
       and a failover module which provides APIs to communicate
       with external heartbead module.
     - COLO Disk Manager
         When pvm writes data into image, the colo disk manger
       captures this data and send it to the colo disk manger
       which makes sure the context of svm's image is consentient
       with the context of pvm's image.

I wonder if there is anyway to coordinate this between COLO, Michael
Hines microcheckpointing and the two separate reverse-execution
projects that also need to do some similar things.
Are there any standard APIs for the heartbeet thing we can already
tie into?

Sadly we have checked MC, it does not have heartbeat support for now.

   - COLO Agent("Proxy module" in the arch picture)
       We need an agent to compare the packets returned by
     Primary VM and Secondary VM, and decide whether to start a
     checkpoint according to some rules. It is a linux kernel
     module for host.

Why is that a kernel module, and how does it communicate the state
to the QEMU instance?

The reason we made this a kernel module is to gain better performance.
We can easily hook the packets in a kernel module.
QEMU instance uses ioctl() to communicate with the COLO Agent.

   - Other minor modifications
       We may need other modifications for better performance.

Dave
P.S. I'm starting to look at fault-tolerance stuff, but haven't
got very far yet, so starting to try and understand the details
of COLO, microcheckpointing, etc

--
Thanks,
Yang.

--
Dr. David Alan Gilbert / dgilbert@xxxxxxxxxx / Manchester, UK
.

--
Thanks,
Yang.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html