Re: [Discussion] How do we think about time out mechanism?

Martin Kletzander <mkletzan@xxxxxxxxxx> · Tue, 5 Aug 2014 11:13:41 +0200

On Tue, Aug 05, 2014 at 03:15:18PM +0800, James wrote:
In fact, to deal with this kind of situation, we add some timeout codes in libvirtd, during remote_dispatch process.
The mechanism is like this:
1. when we call an API, we start a thread to do the timer, when time out, the timer set a timeout flag to the API,
  and return timeout result to the libvirt client.
2. when the API return to remote_dispatch level, it checkout the timeout flag to consider what to do next.
  If timeout, we do some rollback action. It's like detach device, if we attach device at first.

In this solution, there's something trouble, first, we have to figure out suitable rollback actions. Second, I'm
not sure it's the best way to solve this kind of block problem, not so elegant.

How do you think about it?

I'm not sure what do you want to know.  Yes, there are problems like
"what rollback actions to do", which would depend on where the call
got stuck and "what's the timeout that should be set", which depends
on thousands of factors.  I can't think of any elegant solution that
would prevent locking properly.  Mainly because this is literally the
Halting problem [1] plus a bit more.

I'd say that whatever works for you in this situation is OK, but will
(most probably) work only for your particular scenario.

Martin

[1] https://en.wikipedia.org/wiki/Halting_problem
Attachment:
signature.asc

Description: Digital signature
--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list