Re: Question about the killing spree during the transition from the initrd to the root file system.

Lennart Poettering <lennart@xxxxxxxxxxxxxx> · Mon, 8 Jul 2024 13:25:00 +0200

On Do, 04.07.24 21:48, Zheng Chuan (zhengchuan@xxxxxxxxxx) wrote:

> >> I have some processes in my initrd needed to be excluded from the killing spree
> >> during switch-root and needed to continue to run in the root file system. I read
> >> the ROOT_STORAGE_DAEMONS.md and the source code of killall.c, and I've learned
> >> that there are methods to exclude the processes from the killing spree, such as
> >> setting `@` to `argv[0][0]`.
> >>
> >> However, I'm not sure if this is without potential consequences. For example, could
> >> it be that even though my processes survive, some resources that the processes
> >> depends on are discarded after switch-root, such as file
> >> descriptors?
> >
> > No, these belong to your process, systemd couldn't really reach into
> > your processes to close them, even if it wanted to.
> >
> > But do note that any files you keep open or mapped at the moment of transition
> > will remain pinned in memory, and cannot be released by the
> > kernel. this means that even though during the tmpfs→host transition
> > we generally destory the initrd's tmpfs' contents, the stuff you keep
> > pinned will stick around.
> >
> Yes, tmpfs will release all memory and may leave the fd(deleted) which belong to
> the remaining process, what'more, tmpfs could not do more things like setcap.
> To solve this, we want to change the tmpfs into initramfs and keep the memory with some
> memory waste, is that OK?

Well, keeping the initrd's tmpfs populated sucks of course, we
generally avoid that. But I don't know your scenario. Whether it's OK
to pin resources from the initrd tmpfs forever you need to figure out
in your case.

> > Generally, only special purpose software should be left around that
> > way, if it is carefully written to handle this. For example it is not
> > allowed to dlopen() anything (and hence no NSS either! No
> > gethostbyname() or getpwnam() or so), because you'd otherwise end up
> > with a weird mix of match of shared libs from the initrd and the host.
> >
> Yes，we keep all software version including shared libs as same as the host.
> In our scenario, we want to do kexec from old os to the newer one, and we want to
> pull up the process we cared as soon as possible before we do switch-root and other slow
> stunffs liking scanning disks and probing driver, etc.

Are you aware of systemd's "soft reboot" logic?

It allows a very quick way to reboot, without replacing the
kernel. it even allows you to keep specific services around during
reboot.

I know of companies that deploy this together with kernel live
patching to make OS updates work without downtimes.

That said, in your initrd scenario: consider statring up your service
in the initrd, getting things running, and then terminating before the
initrd transition while passing your open fds to PID 1 via the
"fdstore" logic. Then, after the transition start the service anew,
and you'll get the fds passed back in, so that the service can
continue doing what it needs to do, without any sockets and similar
resources being released in between.

> >> Question 2:
> >> If my processes are excluded from the killing spree during switch-root and continue to run in
> >> the root file system, what are the potential consequences?
> >
> > You are running a processes from a different context, pinning files
> > from an emptied file system.
> >
> Is that OK if
> i. we keep all initrd files by initramfs without releasing them
> ii. keep all software version as the same between initrd and host
> iii. reopen some files like logs in case of OOM
>
> Otherwise, i am not sure if it is OK under some safty feature like SELinux, we will test them
> later.

Well, I can't tell you what is OK in your specific scenario, but I
personally find arrangements like that icky.

Lennart

--
Lennart Poettering, Berlin