On Mon, Jul 16, 2018 at 10:18:59 +0100, Daniel Berrange wrote: > On Thu, Jul 12, 2018 at 03:34:00PM -0400, John Ferlan wrote: > > On 07/09/2018 03:11 AM, Peter Krempa wrote: > > > On Sat, Jul 07, 2018 at 08:11:05 -0400, John Ferlan wrote: > > I think the key thing to bear in mind is what is the benefit of what we > are trying todo. ie why do we need todo a clean shutdown instead of just > immediately exiting. The kernel will clean up all resources associated > with the process when it exits. So we only need care about a clean shutdown > if there is something needing cleanup that the kernel won't handle, or if > we are trying to debug with valgrind & similar tools. > > I tend to think we put way too much time into worrying about getting perfect > clean shutdown, for little real world benefit. > > IMHO, we should make a moderate effort to shutdown cleanly, but if there's > something stuck after 15 seconds, we should just uncoditionally call exit(). I'm not really worried about the uncleannes of the shutdown but rather desynchronisation of qemu-libvirt state. If we exit during a modify job prior to writing the status XML to disk we might end up in an intermediate state which will not be recognized by libvirt afterwards. Just exit()-ing is not a systematic solution here because of the above. In my opinion a better solution is to close the monitor connections after some time and only in cases where we know it's safe. E.g. in QUERY type jobs. Some modify-type jobs may get stuck as well, but only a very few commands create this risk so annotating them and not allowing to break the monitor there should be way better. Currently I'm mostly worried about the 'transaction' command which is used for snapshots, as losing track of the new files is not recoverable. Other block jobs which modify the backing graph would be a problem normally but since we don't track the bakcing chain fully yet it's okay except for finishing step of the active block commit. For the blockjobs it will be possible to work this around by the new APIs in qemu which allow the jobs to linger until we retrieve the state. That means that the only problematic job probably will be 'transaction'/snapshot creation.
Attachment:
signature.asc
Description: PGP signature
-- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list