RE: [Qemu-devel] [RFC] COLO HA Project proposal

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> > Thanks Dave:
> > 	Whether the randomness value/branch/code path the PVM and SVM
> may
> > have, It is only a performance issue. COLO never assumes the PVM and
> > SVM has same internal Machine state.  From correctness p.o.v, as if
> > the PVM and SVM generate Identical response, we can view the SVM is a
> > valid replica of PVM, and the SVM can take over When the PVM suffers
> > from hardware failure. We can view the client is all the way talking
> > with the SVM, without the notion of PVM.  Of course, if the SVM dies, we
> can regenerate a copy of PVM with a new checkpoint too.
> > 	The SOCC paper has the detail recovery model :)
> 
> I've had a read; I think the bit I was asking about was what you labelled 'D' in
> that papers fig.4 - so I think that does explain it for me.

Very good :)

> But I also have some more questions:
> 
>   1) 5.3.3 Web server
>     a) In fig 11 it shows Remus's performance dropping off with the number
> of threads - why is that? Is it
>        just an increase in the amount of memory changes in each
> snapshot?

I didn't dig into details of them, but document the throughput we observed.
I felt a bit stranger too, memory dirty page set may be larger than small connection
Case, but I am not sure and that is the data we saw :(

>     b) Is fig 11/12 measured with all of the TCP optimisations shown in fig
> 13 on?

Yes.

> 
>   2) Did you manage to overcome the issue shown in 5.6 with newer guest
> kernels degredation - could you just fall
>      back to micro checkpointing if the guests diverge too quickly?

In general, I would say the COLO performance for these 2 workloads is pretty good, and 
I actually didn't list the subsection 5.6 initially. It is the conference sepherd who ask me to 
add this paragraph to make the paper to be balanced :)

In summary, COLO can have very good MP-guest performance comparing with Remus, with 
the payment of potential optimization/modification effort to guest TCP/IP stack. One solution may
Not work for all workloads, but it provides a large room for OSVs to provide customized solution
for a specific usage -- which I think is very good for open source biz model: make money through 
consultant. Huawei technology Ltd. announced to support COLO in there cloud OS, 
Probably for specific usage too.

> 
>   3) Was the link between the two servers for synchronisation a low-latency
> dedicated connection?

We use 10 Gbps NIC in the paper, and yes it is dedicated link, but the solution itself doesn't 
require dedicated link.

> 
>   4) Did you try an ftp PUT benchmark using external storage - i.e. that
> wouldn't have the local disc
>      synchronisation overhead?

Not yet.
External network shared storage works, but today the performance may be not that good, 
because our optimization so far is still very limited. It is just an initial effort to make the 2
common workloads happy. We believe there are large room ahead to make the response of 
TCP/IP stack more predictable. Once the basic COLO stuff is ready for product and accepted
by the industry, it is possible we may impact TCP community to have this kind of predictability
 in mind for the future protocol development, which will greatly help the performance.


Thx Eddie
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux