On Wed, Feb 20, 2019 at 10:46:24AM +0800, Ian Kent wrote: > On Fri, 2019-02-15 at 16:07 +0000, David Howells wrote: > > Implement a kernel container object such that it contains the following > > things: > > > > (1) Namespaces. > > > > (2) A root directory. > > > > (3) A set of processes, including one designated as the 'init' process. > > Yeah, I think a name other than init needs to be used for this > process. > > The problem being that there is no requirement for container > process 1 to behave in any way like an "init" process is > expected to behave and that leads to confusion (at least > it certainly did for me). If you look at the documentation for pid namespaces(7) you can see that the pid 1 inside a pid namespace is expected to behave like an init process: - "The first process created in a new namespace [...] has the PID 1, and is the "init" process for the namespace (see init(1))." - "[...] child process that is orphaned within the namespace will be reparented to this process rather than init(1) [...]" - "If the "init" process of a PID namespace terminates, the kernel terminates all of the processes in the namespace via a SIGKILL signal. This behavior reflects the fact that the "init" process is essential for the cor‐ rect operation of a PID namespace." - "Only signals for which the "init" process has established a signal handler can be sent to the "init" process by other members of the PID namespace." - "[...] the reboot(2) system call causes a signal to be sent to the namespace "init" process." This is one of the reasons why all major current container runtimes finally after years of failing to realize this run a stub init process that mimicks a dumb init. Sure, you get away with not having an init that behaves like an init but this is inherently broken or at least against the way pid namespaces were designed.