On 06/12/2018 12:36 AM, Eduardo Habkost wrote: > CCing libvir-list. > > On Mon, Jun 11, 2018 at 11:29:24PM +0200, Igor Mammedov wrote: >> On Mon, 11 Jun 2018 16:06:07 -0300 >> Eduardo Habkost <ehabkost@xxxxxxxxxx> wrote: >> >>> On Mon, Jun 11, 2018 at 03:16:25PM +0200, Igor Mammedov wrote: >>>> On Fri, 8 Jun 2018 10:21:05 -0300 >>>> Eduardo Habkost <ehabkost@xxxxxxxxxx> wrote: >>>> >>>>> On Thu, Jun 07, 2018 at 02:00:09PM +0200, Igor Mammedov wrote: >>>>>> When using --daemonize, the initial lead process will fork a child and >>>>>> then wait to be notified that setup is complete via a pipe, before it >>>>>> exits. When using --preconfig there is an extra call to main_loop() >>>>>> before the notification is done from os_setup_post(). Thus the parent >>>>>> process won't exit until the mgmt application connects to the monitor >>>>>> and tells QEMU to leave the RUN_STATE_PRECONFIG. The mgmt application >>>>>> won't connect to the monitor until daemonizing has completed though. >>>>>> >>>>>> This is a chicken and egg problem, leading to deadlock at startup. >>>>>> >>>>>> The only viable way to fix this is to call os_setup_post() before >>>>>> the early main_loop() call when --preconfig is used. This has the >>>>>> downside that any errors from this point onwards won't be handled >>>>>> well by the mgmt application, because it will think QEMU has started >>>>>> successfully, so not be expecting an abrupt exit. Moving as much user >>>>>> input validation as possible to before the main_loop() call might help, >>>>>> but mgmt application should stop assuming that QEMU has started >>>>>> successfuly and use other means to collect errors from QEMU (logfile). >>>>>> >>>>>> Signed-off-by: Daniel P. Berrangé <berrange@xxxxxxxxxx> >>>>>> Signed-off-by: Igor Mammedov <imammedo@xxxxxxxxxx> >>>>>> --- >>>>>> v5: >>>>>> * use original Daniel's patch [1], but addapt it to apply on top of >>>>>> "[PATCH v3 1/2] cli: Don't run early event loop if no --preconfig was specified" >>>>>> with extra comment and massage commit message a little bit. >>>>>> v6: >>>>>> * hide os_setup_post_done flag inside of os_setup_post() as it was in v4 >>>>>> >>>>>> CC: berrange@xxxxxxxxxx >>>>>> CC: mreitz@xxxxxxxxxx >>>>>> CC: pbonzini@xxxxxxxxxx >>>>>> CC: ehabkost@xxxxxxxxxx >>>>>> CC: ldoktor@xxxxxxxxxx >>>>>> CC: eblake@xxxxxxxxxx >>>>>> --- >>>>>> os-posix.c | 6 ++++++ >>>>>> vl.c | 6 ++++++ >>>>>> 2 files changed, 12 insertions(+) >>>>>> >>>>>> diff --git a/os-posix.c b/os-posix.c >>>>>> index 9ce6f74..0246195 100644 >>>>>> --- a/os-posix.c >>>>>> +++ b/os-posix.c >>>>>> @@ -309,8 +309,14 @@ void os_daemonize(void) >>>>>> >>>>>> void os_setup_post(void) >>>>>> { >>>>>> + static bool os_setup_post_done; >>>>>> int fd = 0; >>>>>> >>>>>> + if (os_setup_post_done) { >>>>>> + return; >>>>>> + } >>>>>> + os_setup_post_done = true; >>>>>> + >>>>>> if (daemonize) { >>>>>> if (chdir("/")) { >>>>>> error_report("not able to chdir to /: %s", strerror(errno)); >>>>>> diff --git a/vl.c b/vl.c >>>>>> index fa44138..457ff2a 100644 >>>>>> --- a/vl.c >>>>>> +++ b/vl.c >>>>>> @@ -4578,6 +4578,12 @@ int main(int argc, char **argv, char **envp) >>>>>> parse_numa_opts(current_machine); >>>>>> >>>>>> /* do monitor/qmp handling at preconfig state if requested */ >>>>>> + if (!preconfig_exit_requested && is_daemonized()) { >>>>>> + /* signal parent QEMU to exit, libvirt treats it as a sign >>>>>> + * that monitor socket is ready to accept connections >>>>>> + */ >>>>>> + os_setup_post(); >>>>>> + } >>>>> >>>>> I was looking at the daemonize logic, and noticed it we have a >>>>> huge amount of code between this line and the next >>>>> os_setup_post() call that could either: >>>>> >>>>> * call exit() and/or error_report(); or >>>> logging would work to the extent mentioned in commit message, >>>> i.e. it' would work fine when log file is used otherwise it >>>> errors will go to /dev/null >>>> >>>> so it should be more or less fine on this point >>> >>> My worry is that most users of error_report() involve an exit() >>> call too. >>> >>> Once we have an active monitor, we must never call exit() >>> directly. Even qmp_quit() doesn't call exit() directly. >> Is there any reason why exit() can't be called? > > QMP clients don't expect the QMP socket to be closed except when > using the 'quit' command. Libvirt views HANGUP on monitor socket as qemu process dying unexpectedly so it starts cleanup (which involves 'kill -9' of qemu). > >> >>>>> * be unable to finish machine initialization because of >>>>> chdir("/"), change_root(), or change_process_uid(). >>>> this one really no go. >>>> I see 2 options here, >>>> >>>> * move init code that opens files to early stage (before preconfig monitor) >>>> or split it to open files early. >>>> (I've spotted several obvious places fwcfg/vnc/replay_start/migration) >>>> but there might be code somewhere in callbacks that would do it too, >>>> so it rather risky to go this route. >>>> (I'd do this anyways one place at the time using sanitizing >>>> initialization sequence pretext.) >>> >>> We might have QMP commands that take file paths as input, so is >>> this really an option? >> I'd think that in future we would want to enable object_add in preconfig >> to create backends at runtime, so yes we can't do chroot at this point >> >> >>>> * split out signaling part that tells parent process to exit into >>>> separate helper that's called once before/from main_loop(). >>>> This option seems low risk and additionally error output to >>>> stderr will work as it does currently (until os_setup_post()) >>> >>> My assumption is that separating the chdir()/stdout/stderr logic >>> from the fork/daemonize/exit steps wouldn't be possible without >>> breaking expectations about -daemonize. >> it's already separated and that's what creates one side of problem. > > Is it? Right now '$QEMU -daemonize' will never call exit(0) > before the child process it spawned did the > chdir()/stdout/stderr/etc trick. > > >> What I suggest is to leave it as is and move out only >> len = write(daemon_pipe, &status, 1); >> part of os_setup_post() to sync with parent process. That shouldn't >> affect daemonizing flow on QEMU side and would let libvirt reuse parent's >> exit as sync point to detect moment when monitor is available. >> (patch is in testing, I'll post it tomorrow if it doesn't break tests) > > This will affect the daemonizing flow, won't it? It will make > QEMU exit(0) before the child does the chdir()/stderr/stdout/etc > cleanup. A well-behaved daemon shouldn't do this. > > This is probably not a problem for libvirt (which only uses > -daemonize as a sync point for QMP), but possibly a problem for > other users of -daemonize. I think the proper behaviour is to exit(0) after chdir()/stderr/... AND monitor is set up. Then, if any caller started qemu with -preconfig they know they can start talking on monitor, set up whatever it is they want and issue 'cont' finally (or what is the right command to exit preconfig state). This way nothing changes for callers not using -preconfig. > >> >> In worst case if we can't do the later in QEMU, mgmt would have to cope with >> monitor in preconfig mode without relying on parent exit(0) sync point. >> (a typical daemon would fork/chroot and co in one place and clients would use >> other means to detect socket availability other than watching parent process >> exiting) > > Do we really need to make -daemonize and -preconfig work > together? libvirt uses -daemonize only for its initial > capability probing, which shouldn't require -preconfig at all. > > Even on that case, I wonder why libvirt doesn't simply create a > server socket and waits for QEMU to connect instead of using > -daemonize as a sync point. > because libvirt views qemu as well behaved daemon. Should anything go wrong libvirt reads qemu's stderr and reports error to upper layers. Michal -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list