On Thu, Mar 05, 2020 at 01:09:10PM -0800, James Bottomley wrote: > On Thu, 2020-03-05 at 19:35 +0000, Ignat Korchagin wrote: > > The main need for this is to support container runtimes on stateless > > Linux system (pivot_root system call from initramfs). > > > > Normally, the task of initramfs is to mount and switch to a "real" > > root filesystem. However, on stateless systems (booting over the > > network) it is just convenient to have your "real" filesystem as > > initramfs from the start. > > > > This, however, breaks different container runtimes, because they > > usually use pivot_root system call after creating their mount > > namespace. But pivot_root does not work from initramfs, because > > initramfs runs form rootfs, which is the root of the mount tree and > > can't be unmounted. > > Can you say more about why this is a problem? We use pivot_root to > pivot from the initramfs rootfs to the newly discovered and mounted > real root ... the same mechanism should work for a container (mount > namespace) running from initramfs ... why doesn't it? Not sure how it interacts with mount namespaces, but we don't use pivot_root to go from rootfs to the real root. We use switch_root, which moves the new root onto the old / using mount with MS_MOVE and then chroot to it. https://www.kernel.org/doc/Documentation/filesystems/ramfs-rootfs-initramfs.txt > > The sequence usually looks like: create and enter a mount namespace, > build a tmpfs for the container in some $root directory then do > > > cd $root > mkdir old-root > pivot_root . old-root > mount -- > make-rprivate /old-root > umount -l /old-root > rmdir /old-root > > Once that's done you're disconnected from the initramfs root. The > sequence is really no accident because it's what the initramfs would > have done to pivot to the new root anyway (that's where container > people got it from). > > > James >