Re: Smooth upgrades for socket activated services

Lennart Poettering <lennart@xxxxxxxxxxxxxx> · Thu, 2 Mar 2023 17:05:54 +0100

On Mo, 20.02.23 11:05, Mike Hearn (mike@hydraulic.software) wrote:

> Hi,
>
> I'm exploring socket activation as part of work on a tool that makes
> systemd-controlled servers easier to deploy and use. Given a config
> file the tool builds a package that contains the app and systemd
> units, uploads it, installs it with dependency resolution, the
> postinst scripts start the service etc. It's sort of a Docker
> alternative that's more classically Linux-y, designed for a world
> where really big machines are really cheap and thus many apps don't
> need to be cattle-ized. Pets are sometimes OK.
>
> As part of this I'm looking at how to make upgrades smooth. Socket
> activation already allows you to shut down, upgrade and restart a
> service without dropping connections because systemd will hold the
> connections until the service comes back but there are a couple of
> aspects that weren't really clear to me from reading the excellent
> "pid eins" blog post series. Could we maybe get a new blog post
> exploring these issues?
>
> 1. How exactly should you stop a service that's socket activated so it
> won't be re-activated during the upgrade but new connections won't be
> lost, e.g. in package scripts that are executed across upgrades.
> Currently the scripts stop the service before the upgrade happens,
> then restart afterwards.

There's currently no mechanism for that. File an RFE issue.

In the "Portable Services" concept we currently assume you update the
disk image ("DDI") the service is on, and then simply restart the
service while leaving the socket around.

I can see though that if you operate without disk images, then you
might want an explicit synchronization step.

Currently we implement a "freeze" concept for services (which uses the
cgroup freezer underneath), maybe we should extend this for socket
units to mean that we keep the sockets open but don#t act
anymore. You'd then issue "systemctl freeze foobar.socket" before you
do your upgrade and "systemctl thaw" afterwards.

> 2. Is it possible to run two versions of a service unit at once such
> that the old version finishes handling connections and then shuts
> down, whilst new connections are being handled by the new version?

Currently, not.

We have been discussing this scenario many times, and we could
certainly add something for this, but this kinda conflicts with the
goal to provide a pristine execution context for services: if we'd
restart a service like this and leave old processes around then the
cgroup of the service would of course still contain "legacy"
processes, which contradicts the rule that we always start with a
pristine execution environment.

So, there are two conflicting goals: the goal of guaranteeing clean
invocation and the goal of allowing old stuff to "passivate".

Inside of Microsoft we mostly settled on a different approach: instead
of leaving processes around during such restarts, let's instead
serialize all state of ongoing connections and upload their sockets to the
fdstore (i.e. see FileDescriptorStore= docs), along with a memfd of
the serialized state. Benefit of this approach: you solve the problem
properly and fully: after the restart only new code is in place, and
all old code is flushed out.

But of course such an approach requires that services are written in a
way this is possible, i.e. are capable of serializing their fully
state for all ongoing connections along with the socket fds to the
fdstore, and then deserialize all that when initializing again. This
is not hard but also not exactly trivial.

Lennart

--
Lennart Poettering, Berlin