On Fri, Dec 11, 2020 at 02:30:34PM +0100, Thomas Gleixner wrote: > On Thu, Dec 10 2020 at 21:27, Marcelo Tosatti wrote: > > On Thu, Dec 10, 2020 at 10:48:10PM +0100, Thomas Gleixner wrote: > >> You really all live in a seperate universe creating your own rules how > >> things which other people work hard on to get it correct can be screwed > >> over. > > > > 1. T = read timestamp. > > 2. migrate (VM stops for a certain period). > > 3. use timestamp T. > > This is exactly the problem. Time stops at pause and continues where it > stopped on resume. > > But CLOCK_REALTIME and CLOCK_TAI advanced in reality. So up to the point > where NTP fixes this - if there is NTP at all - the guest CLOCK_REALTIME > and CLOCK_TAI are off by tpause. > > Now the application gets a packet from the outside world with a > CLOCK_REALTIME timestamp which is suddenly ahead of the value it reads > from clock_gettime(CLOCK_REALTIME) by tpause. So what is it supposed to > do with that? Make stupid assumptions that the other end screwed up > timekeeping, throw an error that the system it is running on screwed up > timekeeping? And a second later when NTP catched up it gets the next > surprise because the systems CLOCK_REALTIME jumped forward unexpectedly > or if there is no NTP it's confused forever. This can happen even with a "perfect" solution that syncs time instantly on the migration destination. See steps 1,2,3. Unless you notify applications to invalidate their time reads, i can't see a way to fix this. Therefore if you use VM migration in the first place, a certain amount of timestamp accuracy error must be tolerated. > How can you even assume that this is correct? As noted above, even without a window of unsynchronized time (due to delay for NTP to sync time), time reads can be stale. > It is exactly the same problem as we had many years ago with hardware > clocks suddenly stopping to tick which caused quite some stuff to go > belly up. Customers complained when it was 5 seconds off, now its 0.1ms (and people seem happy). > In a proper suspend/resume scenario CLOCK_REALTIME/TAI are advanced > (with a certain degree of accuracy) to compensate for the sleep time, so > the other end of a communication is at least in the same ballpark, but > not 50 seconds off. Its 100ms off with migration, and can be reduced further (customers complained about 5 seconds but seem happy with 0.1ms). > >> This features first, correctness later frenzy is insane and it better > >> stops now before you pile even more crap on the existing steaming pile > >> of insanities. > > > > Sure. > > I wish that would be true. OS people - you should know that - are > fighting forever with hardware people over feature madness and the > attitude of 'we can fix that in software' which turns often enough out > to be wrong. > > Now sadly enough people who suffered from that madness work on > virtualization and instead of trying to avoid the same problem they go > off and make it even worse. So you think its important to reduce the 100ms offset? > It's the same problem again as with hardware people. Not talking to the > other people _before_ making uninformed assumptions and decisions. > > We did it that way because big customer asked for it is not a > justification for inflicting this on everybody else and thereby > violating correctness. Works for me and my big customer is not a proof > of correctness either. > > It's another proof that this industry just "works" by chance. > > Thanks, > > tglx OK, makes sense, then reducing the 0.1ms window even further is a useful thing to do. What would be an acceptable CLOCK_REALTIME accuracy error, on migration?