Re: [PATCH 25/26] rpc: don't let systemd shutdown daemon while saving VMs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Feb 03, 2025 at 02:17:37PM +0100, Peter Krempa wrote:
> On Wed, Jan 08, 2025 at 19:42:58 +0000, Daniel P. Berrangé wrote:
> > The service unit "TimeoutStopSec" setting controls how long systemd
> > waits for a service to stop before aggressively killing it, defaulting
> > to 30 seconds if not set.
> > 
> > When we're processing shutdown of VMs in response to OS shutdown, we
> > very likely need more than 30 seconds to complete this job, and can
> > not stop the daemon during this time.
> > 
> > To avoid being prematurely killed, setup a timer that repeatedly
> > extends the "TimeoutStopSec" value while stop of running VMs is
> > arranged.
> > 
> > This does mean if libvirt hangs while stoppping VMs, systemd won't
> > get to kill the libvirt daemon, but this is considered less harmful
> > that forcefully killing running VMs.
> > 
> > Signed-off-by: Daniel P. Berrangé <berrange@xxxxxxxxxx>
> > ---
> >  src/rpc/virnetdaemon.c | 62 ++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 62 insertions(+)
> > 
> > diff --git a/src/rpc/virnetdaemon.c b/src/rpc/virnetdaemon.c
> > index 8cc7af1182..3ddb9b5404 100644
> > --- a/src/rpc/virnetdaemon.c
> > +++ b/src/rpc/virnetdaemon.c
> > @@ -78,6 +78,9 @@ struct _virNetDaemon {
> >      virNetDaemonShutdownCallback shutdownPrepareCb;
> >      virNetDaemonShutdownCallback shutdownWaitCb;
> >      virThread *shutdownPreserveThread;
> > +    unsigned long long preserveStart;
> > +    unsigned int preserveExtended;
> > +    int preserveTimer;
> >      int quitTimer;
> >      virNetDaemonQuitPhase quit;
> >      bool graceful;
> > @@ -93,6 +96,14 @@ struct _virNetDaemon {
> >  
> >  static virClass *virNetDaemonClass;
> >  
> > +/*
> > + * The minimum additional shutdown time (secs) we should ask
> > + * systemd to allow, while state preservation operations
> > + * are running. A timer will run every 5 seconds, and
> > + * ensure at least this much extra time is requested
> > + */
> > +#define VIR_NET_DAEMON_PRESERVE_MIN_TIME 30
> > +
> >  static int
> >  daemonServerClose(void *payload,
> >                    const char *key G_GNUC_UNUSED,
> > @@ -162,6 +173,7 @@ virNetDaemonNew(void)
> >      if (virEventRegisterDefaultImpl() < 0)
> >          goto error;
> >  
> > +    dmn->preserveTimer = -1;
> >      dmn->autoShutdownTimerID = -1;
> >  
> >  #ifndef WIN32
> > @@ -727,6 +739,42 @@ daemonShutdownWait(void *opaque)
> >      }
> >  }
> >  
> > +static void
> > +virNetDaemonPreserveTimer(int timerid G_GNUC_UNUSED,
> > +                          void *opaque)
> > +{
> > +    virNetDaemon *dmn = opaque;
> > +    VIR_LOCK_GUARD lock = virObjectLockGuard(dmn);
> > +    unsigned long long now = g_get_monotonic_time();
> > +    unsigned long long delta;
> > +
> > +    if (dmn->quit != VIR_NET_DAEMON_QUIT_PRESERVING)
> > +        return;
> > +
> > +    VIR_DEBUG("Started at %llu now %llu extended %u",
> > +              dmn->preserveStart, now, dmn->preserveExtended);
> > +
> > +    /* Time since start of preserving state in usec */
> > +    delta = now - dmn->preserveStart;
> > +    /* Converts to secs */
> > +    delta /= (1000ull * 1000ull);
> > +
> > +    /* Want extra seconds grace to ensure this timer fires
> > +     * again before system timeout expires, under high
> > +     * load conditions */
> > +    delta += VIR_NET_DAEMON_PRESERVE_MIN_TIME;
> > +
> > +    /* Deduct any extension we've previously asked for */
> > +    delta -= dmn->preserveExtended;
> > +
> > +    /* Tell systemd how much more we need to extend by */
> > +    virSystemdNotifyExtendTimeout(delta);
> > +    dmn->preserveExtended += delta;
> > +
> > +    VIR_DEBUG("Extended by %llu", delta);
> > +}
> 
> Assume:
> 
> preserveStart = 0;
> preserveExtend = 30; /* from the initialization */
> 
> Calls:
> 
> by assumption of starting time above above (all in seconds) the notify
> timeout simplifies to:
> 
> notify(delta) = now + 30 - preserveExtend
> 
> Iteration 0 is the direct call, all others are via timer
> 
> iter now()   notify(delta)   preserveExtend
>  0    0         30 (hardcoded)      30
>  1    5          5                  35
>  2   10          5                  40
>  3   15          5                  45
>  4   20          5                  50
> 
> [...]
> 
> 
> 'man sd_notify' says for EXTEND_TIMEOUT_USEC=...
> 
>    The value specified is a time in microseconds during which the
>    service must send a new message.

This is the bit I interpreted wrong. I was thinking that the
EXTEND_TIMEOUT_USEC= is added to the default service stop
timeout, and we must send another message before that overall
timeout is hit.

The actual semantics you point out are in fact much more
useful and easier to deal with :-)


> 
> So this means:
>  - initial iteration indeed extends by 30 seconds
>  - any subsequent iteration extends only by 5
>  - the timer is also 5 seconds, so it's likely to instantly timeout on
>    second iteration
>  - the complicated algorithm doesn't seem to be necessary, all you want
>    is a constant which is 'timer_interval+tolerance'

I'll just use EXTEND_TIMEOUT_USEC set to 10 seconds, and
update it every 5 seconds.

> 
> >  static void
> >  virNetDaemonQuitTimer(int timerid G_GNUC_UNUSED,
> >                        void *opaque)
> > @@ -781,11 +829,21 @@ virNetDaemonRun(virNetDaemon *dmn)
> >  
> >          if (dmn->quit == VIR_NET_DAEMON_QUIT_REQUESTED) {
> >              VIR_DEBUG("Process quit request");
> > +            virSystemdNotifyStopping();
> >              virHashForEach(dmn->servers, daemonServerClose, NULL);
> >  
> >              if (dmn->shutdownPreserveThread) {
> >                  VIR_DEBUG("Shutdown preserve thread running");
> >                  dmn->quit = VIR_NET_DAEMON_QUIT_PRESERVING;
> > +                dmn->preserveStart = g_get_monotonic_time();
> > +                dmn->preserveExtended = VIR_NET_DAEMON_PRESERVE_MIN_TIME;
> 
> ^^^ assumptions
> 
> > +                virSystemdNotifyExtendTimeout(dmn->preserveExtended);
> 
> iter 0
> 
> > +                if ((dmn->preserveTimer = virEventAddTimeout(5 * 1000,
> > +                                                             virNetDaemonPreserveTimer,
> > +                                                             dmn, NULL)) < 0) {
> 
> subsequent iters via timer
> 
> > +                    VIR_WARN("Failed to register preservation timer");
> > +                    /* hope for the best */
> > +                }
> >              } else {
> >                  VIR_DEBUG("Ready to shutdown");
> >                  dmn->quit = VIR_NET_DAEMON_QUIT_READY;
> > @@ -866,6 +924,10 @@ static void virNetDaemonPreserveWorker(void *opaque)
> >              dmn->quit = VIR_NET_DAEMON_QUIT_READY;
> >          }
> >          g_clear_pointer(&dmn->shutdownPreserveThread, g_free);
> > +        if (dmn->preserveTimer != -1) {
> > +            virEventRemoveTimeout(dmn->preserveTimer);
> > +            dmn->preserveTimer = -1;
> > +        }
> >      }
> >  
> >      VIR_DEBUG("End preserve dmn=%p", dmn);
> > -- 
> > 2.47.1
> > 
> 

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|




[Index of Archives]     [Virt Tools]     [Libvirt Users]     [Lib OS Info]     [Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite News]     [KDE Users]     [Fedora Tools]

  Powered by Linux