* Dong, Eddie (eddie.dong@xxxxxxxxx) wrote: > > > > > > I didn't quite understand a couple of things though, perhaps you can > > > explain: > > > 1) If we ignore the TCP sequence number problem, in an SMP machine > > > don't we get other randomnesses - e.g. which core completes something > > > first, or who wins a lock contention, so the output stream might not > > > be identical - so do those normal bits of randomness cause the > > > machines to flag as out-of-sync? > > > > It's about COLO agent, CCing Congyang, he can give the detailed > > explanation. > > > > Let me clarify on this issue. COLO didn't ignore the TCP sequence number, but uses a > new implementation to make the sequence number to be best effort identical > between the primary VM (PVM) and secondary VM (SVM). Likely, VMM has to synchronize > the emulation of randomization number generation mechanism between the > PVM and SVM, like the lock-stepping mechanism does. > > Further mnore, for long TCP connection, we can rely on the (on-demand) VM checkpoint to get the > identical Sequence number both in PVM and SVM. That wasn't really my question; I was worrying about other forms of randomness, such as winners of lock contention, and other SMP non-determinisms, and I'm also worried by what proportion of time the system can't recover from a failure due to being unable to distinguish an SVM failure from a randomness issue. Dave > > > Thanks, Eddie -- Dr. David Alan Gilbert / dgilbert@xxxxxxxxxx / Manchester, UK -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html