Re: [PATCH kvmtool 2/2] Add detach feature

Dave Martin <Dave.Martin@xxxxxxx> · Mon, 8 Jul 2019 12:24:09 +0100

On Fri, Jul 05, 2019 at 03:29:58PM +0100, Andre Przywara wrote:
> On Fri, 5 Jul 2019 12:04:41 +0100
> Dave Martin <Dave.Martin@xxxxxxx> wrote:
> 
> Hi Dave,
> 
> thanks for having a look!
> 
> > On Fri, Jul 05, 2019 at 10:59:14AM +0100, Andre Przywara wrote:
> > > At the moment a kvmtool process started on a terminal has no way of
> > > detaching from the terminal without killing the guest. Existing
> > > workarounds are starting kvmtool in a screen/tmux session or using a
> > > pseudoterminal (--tty 0), both of which have to be done upon guest
> > > creation.
> > > 
> > > Introduce a terminal command to create a pseudoterminal during the
> > > guest's runtime and redirect the console output to that. This will be
> > > triggered by typing the letter 'd' after the kvmtool escape sequence
> > > (default Ctrl+a). This also daemonises kvmtool, so gives back the shell
> > > prompt, and the user can log out without affecting the guest.
> > > 
> > > Naively daemonising kvmtool at that point doesn't work, though, as the
> > > fork() doesn't inherit the threads, so they keep running in the
> > > grandparent process and would be killed by its exit.
> > > The trick used here is to do the double fork() already right at the
> > > beginning of kvmtool's runtime, before spawning the first thread.
> > > We then don't end the parent and grandparent processes yet, instead let
> > > them block until the user actually requests the detach.
> > > This will let all the threads be created in the grandchild process, but
> > > keeps kvmtool still attached to the terminal until the user requests a
> > > detach.
> > > 
> > > Signed-off-by: Andre Przywara <andre.przywara@xxxxxxx>
> > > ---
> > >  builtin-run.c     |  3 ++
> > >  include/kvm/kvm.h |  2 ++
> > >  term.c            | 91 +++++++++++++++++++++++++++++++++++++++++++++++
> > >  3 files changed, 96 insertions(+)
> > > 
> > > diff --git a/builtin-run.c b/builtin-run.c
> > > index f8dc6c72..fa391419 100644
> > > --- a/builtin-run.c
> > > +++ b/builtin-run.c
> > > @@ -592,6 +592,9 @@ static struct kvm *kvm_cmd_run_init(int argc, const char **argv)
> > >  		}
> > >  	}
> > >  
> > > +	/* Fork twice already now to create the threads in the right process. */
> > > +	kvm__prepare_daemonize();
> > > +
> > >  	if (!kvm->cfg.guest_name) {
> > >  		if (kvm->cfg.custom_rootfs) {
> > >  			kvm->cfg.guest_name = kvm->cfg.custom_rootfs_name;
> > > diff --git a/include/kvm/kvm.h b/include/kvm/kvm.h
> > > index 7a738183..801f9474 100644
> > > --- a/include/kvm/kvm.h
> > > +++ b/include/kvm/kvm.h
> > > @@ -90,6 +90,8 @@ struct kvm {
> > >  void kvm__set_dir(const char *fmt, ...);
> > >  const char *kvm__get_dir(void);
> > >  
> > > +int kvm__prepare_daemonize(void);
> > > +
> > >  int kvm__init(struct kvm *kvm);
> > >  struct kvm *kvm__new(void);
> > >  int kvm__recommended_cpus(struct kvm *kvm);
> > > diff --git a/term.c b/term.c
> > > index 7fbd98c6..df8328f8 100644
> > > --- a/term.c
> > > +++ b/term.c
> > > @@ -22,6 +22,7 @@ static struct termios	orig_term;
> > >  
> > >  static int term_fds[TERM_MAX_DEVS][2];
> > >  
> > > +static int daemon_fd;
> > >  static int inotify_fd;
> > >  
> > >  static pthread_t term_poll_thread;
> > > @@ -29,6 +30,92 @@ static pthread_t term_poll_thread;
> > >  /* ctrl-a is used for escape */
> > >  #define term_escape_char	0x01
> > >  
> > > +/* This needs to be called *before* we create any threads. */
> > > +int kvm__prepare_daemonize(void)
> > > +{
> > > +	pid_t pid;
> > > +	char dummy;
> > > +	int child_pipe[2], parent_pipe[2];
> > > +
> > > +	if (pipe(parent_pipe))
> > > +		return -1;
> > > +
> > > +	pid = fork();
> > > +	if (pid < 0)
> > > +		return pid;
> > > +	if (pid > 0) {			/* parent process */
> > > +
> > > +		close(parent_pipe[1]);
> > > +
> > > +		/* Block until we are told to exit. */
> > > +		if (read(parent_pipe[0], &dummy, 1) != 1)
> > > +			perror("reading exit pipe");
> > > +
> > > +		exit(0);  
> > 
> > Can we have the child write some status value instead?
> > 
> > Right now, if the child goes wrong after forking we will just exit with
> > status 0 regardless.
> 
> Sure, but what would we do in this case and why would we care? The child
> is just a "trick", so if this somehow fails, it does early and we are
> screwed anyways.

It's preferable to return with failure status from the parent if the
payload (wiether a child process or not) failed to start up properly.

I haven't looked at the bigger picture here though: if we have already
fired up the VM by this point, there's not interesting that can go
wrong, I guess.

> > Instead, we could have the child send us the exit status... or if
> > the child disappears we'll just get EOF and can return failure
> > appropriately.
> > 
> > Also, how does the invoker kill the guest now that kvmtool has exited?
> 
> Either by using "lkvm stop -n ..." or by exiting from within the guest.
> This would be the same as one would create a guest with --tty and put that
> in the background right now.

OK, I guess lkvm stop works sufficiently for that, then.

> > > +	}
> > > +
> > > +	close(parent_pipe[0]);
> > > +	if (pipe(child_pipe))
> > > +		return -1;
> > > +	daemon_fd = child_pipe[1];
> > > +
> > > +	/* Become a session leader. */
> > > +	setsid();
> > > +	pid = fork();
> > > +	if (pid > 0) {
> > > +		close(child_pipe[1]);
> > > +
> > > +		/* Block until we are told to exit. */
> > > +		if (read(child_pipe[0], &dummy, 1) != 1)
> > > +			perror("reading child exit pipe");
> > > +
> > > +		if (write(parent_pipe[1], &dummy, 1) != 1)
> > > +			pr_warning("could not kill daemon's parent");
> > > +
> > > +		exit(0);
> > > +	}
> > > +	close(child_pipe[0]);
> > > +	close(parent_pipe[1]);  
> > 
> > Why do we need to fork twice?  The extra process seems redundant.
> 
> This is the ~50 year old textbook method of creating a UNIX daemon: By
> forking twice and killing the child (the "middle" process), the grandchild
> becomes an orphan, and will be picked up by PID 1 (init). This prevents
> any interactions with controlling terminals in the future. Dr. Google or a
> local library should have more information.

Hmmm, I vaguely remember that now.

I'm not sure there's much actual reason to do that except to ensure that
the new process isn't a session leader, because session leaders with no
terminal may acquire any tty they open as a controlling tty, unless they
pass O_NOCTTY all the time to open() and friends.

Running ps -Aj suggests that not all daemons do this, but if we might
open stuff after forking, I guess it's best to be safe.

> > > +	/* Only the grandchild returns here, to do the actual work. */
> > > +	return 0;
> > > +}
> > > +
> > > +static void term_set_tty(int term);
> > > +
> > > +static void detach_terminal(int term)
> > > +{
> > > +	char dummy = 0;
> > > +
> > > +	/* Detaching only make sense if we use the process' terminal. */
> > > +	if (term_fds[term][TERM_FD_IN] != STDIN_FILENO)
> > > +		return;
> > > +
> > > +	/* Clean up just this terminal, leave the others alone. */
> > > +	tcsetattr(term_fds[term][TERM_FD_IN], TCSANOW, &orig_term);
> > > +
> > > +	/* Redirect this terminal to a PTY */
> > > +	term_set_tty(term);
> > > +
> > > +	/*
> > > +	 * Replace STDIN/STDOUT with this new PTY. This will automatically
> > > +	 * transfer all the other serial terminals over.
> > > +	 */
> > > +	dup2(term_fds[term][TERM_FD_IN], STDIN_FILENO);
> > > +	dup2(term_fds[term][TERM_FD_OUT], STDOUT_FILENO);  
> > 
> > Output to a pty master before the slave is opened just disappears.
> 
> Not always, it seems. When I do:
> $ lkvm run -k Image --tty 0
> then wait for a bit and open the pseudo-terminal, I get the full boot log
> "replayed". Not sure who in the chain is actually buffering this, though?

Hmmm, dunno.  Last time I experimented, I didn't see this.

But I can't see what could be buffering it unless the kernel does it or
the writes just block until the slave is opened.  Or we simply don't
consider the pty master writable until the slave is opened, resulting
in kvmtool buffering the output as a side-effect.

Maybe the kernel's behaviour on this point has changed over time.

> > Possibly it would be useful to buffer it somewhere.
> 
> Sounds useful, but it's more a story for another time, I guess, since we
> have the same issue with --tty already.
> One related problem is the gap between starting/detaching a guest and
> connecting to the terminal for the first time, as one cannot do this fast
> and reliable enough without loosing characters, I guess. Maybe it's worth
> to introduce something like "start as paused", so we can connect to the
> terminal and then do a "lkvm resume".
> 
> > I wonder whether it would make sense to redirect to a notional dummy
> > terminal (which might or might not buffer output), and start talking to
> > a pty or socket only once opened by a client.
> 
> Mmmh, sounds like that would fit in nicely since with patch 1/2 we now
> know about connects and possibly even disconnects.

OK, I guess this comes under future ehancements (if we need to do it at
all).

Cheers
---Dave