* Hongyang Yang (yanghy@xxxxxxxxxxxxxx) wrote: > > > ??? 09/12/2014 07:17 PM, Dr. David Alan Gilbert ??????: > >* Hongyang Yang (yanghy@xxxxxxxxxxxxxx) wrote: > >> > >> > >>??? 08/01/2014 11:03 PM, Dr. David Alan Gilbert ??????: > >>>* Yang Hongyang (yanghy@xxxxxxxxxxxxxx) wrote: > > > ><snip> > > > >>>>+static int do_colo_transaction(MigrationState *s, QEMUFile *control, > >>>>+ QEMUFile *trans) > >>>>+{ > >>>>+ int ret; > >>>>+ > >>>>+ ret = colo_ctl_put(s->file, COLO_CHECKPOINT_NEW); > >>>>+ if (ret) { > >>>>+ goto out; > >>>>+ } > >>>>+ > >>>>+ ret = colo_ctl_get(control, COLO_CHECKPOINT_SUSPENDED); > >>> > >>>What happens at this point if the slave just doesn't respond? > >>>(i.e. the socket doesn't drop - you just don't get the byte). > >> > >>If the socket return bytes that were not expected, exit. If > >>socket return error, do some cleanup and quit COLO process. > >>refer to: colo_ctl_get() and colo_ctl_get_value() > > > >But what happens if the slave just doesn't respond at all; e.g. > >if the slave host loses power, it'll take a while (many seconds) > >before the socket will timeout. > > It will wait until the call returns timeout error, and then do some > cleanup and quit COLO process. If it was to wait here for ~30seconds for the timeout what would happen to the primary? Would it be stopped from sending any network traffic for those 30 seconds - I think that's too long to fail over. > There may be better way to handle this? In postcopy I always take reads coming back from the destination in a separate thread, because that thread can't block the main thread going out (I originally did that using async reads but the thread is nicer). You could also use something like a poll() with a shorter timeout to however long you are happy for COLO to go before it fails. Dave -- Dr. David Alan Gilbert / dgilbert@xxxxxxxxxx / Manchester, UK -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html