Re: [Qemu-devel] [RFC PATCH 00/20] Kemari for KVM v0.1

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Anthony Liguori wrote:
On 04/22/2010 08:16 AM, Yoshiaki Tamura wrote:
2010/4/22 Dor Laor<dlaor@xxxxxxxxxx>:
On 04/22/2010 01:35 PM, Yoshiaki Tamura wrote:
Dor Laor wrote:
On 04/21/2010 08:57 AM, Yoshiaki Tamura wrote:
Hi all,

We have been implementing the prototype of Kemari for KVM, and we're
sending
this message to share what we have now and TODO lists. Hopefully, we
would like
to get early feedback to keep us in the right direction. Although
advanced
approaches in the TODO lists are fascinating, we would like to run
this project
step by step while absorbing comments from the community. The current
code is
based on qemu-kvm.git 2b644fd0e737407133c88054ba498e772ce01f27.

For those who are new to Kemari for KVM, please take a look at the
following RFC which we posted last year.

http://www.mail-archive.com/kvm@xxxxxxxxxxxxxxx/msg25022.html

The transmission/transaction protocol, and most of the control
logic is
implemented in QEMU. However, we needed a hack in KVM to prevent rip
from
proceeding before synchronizing VMs. It may also need some
plumbing in
the
kernel side to guarantee replayability of certain events and
instructions,
integrate the RAS capabilities of newer x86 hardware with the HA
stack, as well
as for optimization purposes, for example.
[ snap]

The rest of this message describes TODO lists grouped by each topic.

=== event tapping ===

Event tapping is the core component of Kemari, and it decides on
which
event the
primary should synchronize with the secondary. The basic assumption
here is
that outgoing I/O operations are idempotent, which is usually true
for
disk I/O
and reliable network protocols such as TCP.
IMO any type of network even should be stalled too. What if the VM
runs
non tcp protocol and the packet that the master node sent reached some
remote client and before the sync to the slave the master failed?
In current implementation, it is actually stalling any type of network
that goes through virtio-net.

However, if the application was using unreliable protocols, it should
have its own recovering mechanism, or it should be completely
stateless.
Why do you treat tcp differently? You can damage the entire VM this
way -
think of dhcp request that was dropped on the moment you switched
between
the master and the slave?
I'm not trying to say that we should treat tcp differently, but just
it's severe.
In case of dhcp request, the client would have a chance to retry after
failover, correct?
BTW, in current implementation,

I'm slightly confused about the current implementation vs. my
recollection of the original paper with Xen. I had thought that all disk
and network I/O was buffered in such a way that at each checkpoint, the
I/O operations would be released in a burst. Otherwise, you would have
to synchronize after every I/O operation which is what it seems the
current implementation does.

Yes, you're almost right.
It's synchronizing before QEMU starts emulating I/O at each device model.
It was originally designed that way to avoid complexity of introducing buffering mechanism and additional I/O latency by buffering.

I'm not sure how that is accomplished
atomically though since you could have a completed I/O operation
duplicated on the slave node provided it didn't notify completion prior
to failure.

That's exactly the point I wanted to discuss.
Currently, we're calling vm_stop(0), qemu_aio_flush() and bdrv_flush_all() before qemu_save_state_all() in ft_tranx_ready(), to ensure outstanding I/O is complete. I mimicked what existing live migration is doing.
It's not enough?

Is there another kemari component that somehow handles buffering I/O
that is not obvious from these patches?

No, I'm not hiding anything, and I would share any information regarding Kemari to develop it in this community :-)

Thanks,

Yoshi


Regards,

Anthony Liguori



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux