Re: [RFC PATCH 0/4] namespacefs: Proof-of-Concept

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 19 Nov 2021 07:45:01 -0500
James Bottomley <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> wrote:

> On Thu, 2021-11-18 at 14:24 -0500, Steven Rostedt wrote:
> > On Thu, 18 Nov 2021 12:55:07 -0600
> > ebiederm@xxxxxxxxxxxx (Eric W. Biederman) wrote:
> >   
> > > It is not correct to use inode numbers as the actual names for
> > > namespaces.
> > > 
> > > I can not see anything else you can possibly uses as names for
> > > namespaces.  
> > 
> > This is why we used inode numbers.
> >   
> > > To allow container migration between machines and similar things
> > > the you wind up needing a namespace for your names of namespaces.  
> > 
> > Is this why you say inode numbers are incorrect?  
> 
> The problem is you seem to have picked on one orchestration system
> without considering all the uses of namespaces and how this would
> impact them.  So let me explain why inode numbers are incorrect and it
> will possibly illuminate some of the cans of worms you're opening.
> 
> We have a container checkpoint/restore system called CRIU that can be
> used to snapshot the state of a pid subtree and restore it.  It can be
> used for the entire system or piece of it.  It is also used by some
> orchestration systems to live migrate containers.  Any property of a
> container system that has meaning must be saved and restored by CRIU.
> 
> The inode number is simply a semi random number assigned to the
> namespace.  it shows up in /proc/<pid>/ns but nowhere else and isn't
> used by anything.  When CRIU migrates or restores containers, all the
> namespaces that compose them get different inode values on the restore.
> If you want to make the inode number equivalent to the container name,
> they'd have to restore to the previous number because you've made it a
> property of the namespace.  The way everything is set up now, that's
> just not possible and never will be.  Inode numbers are a 32 bit space
> and can't be globally unique.  If you want a container name, it will
> have to be something like a new UUID and that's the first problem you
> should tackle.

So everyone seems to be all upset about using inode number. We could do
what Kirill suggested and just create some random UUID and use that. We
could have a file in the directory called inode that has the inode number
(as that's what both docker and podman use to identify their containers,
and it's nice to have something to map back to them).

On checkpoint restore, only the directories that represent the container
that migrated matter, so as Kirill said, make sure they get the old UUID
name, and expose that as the directory.

If a container is looking at directories of other containers on the system,
then it gets migrated to another system, it should be treated as though
those directories were deleted under them.

I still do not see what the issue is here.

-- Steve






[Index of Archives]     [Cgroups]     [Netdev]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux