Re: [RFC][PATCH] ns: Syscalls for better namespace sharing control.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Eric W. Biederman wrote:
Daniel Lezcano <daniel.lezcano@xxxxxxx> writes:

Eric W. Biederman wrote:

If the normal rules of parentage apply, that means pid 0 has to wait it's child.
If we are in the scenario of pid 0, it's child pid 1234 and we kill the pid 1 of
the pid namespace, I suppose pid 1234 will be killed too.
The pid 0 will stay in the pid namespace and will able to fork again a new pid
1.

I think Serge already reported that...

That sounds good :)

I expect zap_pid_ns_processes should also arrange so we cannot allocate any
more processes.  We certainly need to do something explicit or pid 1 won't
be allocated.  It might make sense to resurrect a pid namespace after it's
death but it is definitely weird.
Mmh, yes. But that was just an idea, maybe a bit out of the scope you are aiming.

In a lot of ways I like this idea of sys_hijack/sys_cloneat, and I
don't think anything I am doing fundamentally undermines it.  The use
case of doing things in fork is that there is automatic inheritance of
everything.  All of the namespaces and all of the control groups, and
possibly also the parent process.
And also the rootfs for executing the command inside the container
(eg. shutdown), the uid/gid (if there is a user namespace), the mount points,
...
But I suppose we can do the same with setns for all the namespaces and chrooting
within the container rootfs.

What I see is a problem with the tty. For example, we cloneat the init process
of the container which is usually /sbin/init but this one has its tty mapped to
/dev/console, so the output of the exec'ed command will go to the console.

My original thinking was that the fd's would come from the caller of sys_cloneat....
Oh, ok :s

Overall it sounds like the semantics I have proposed with
unshare(CLONE_NEWPID) are workable, and simple to implement.  The
extra fork is a bit surprising but it certainly does not
look like a show stopper for implementing a pid namespace join.
I agree, it's some kind of "ghost" process.
IMO, with a bit of userspace code it would be possible to enter or exec a
command inside a container with nsfd, setns.

+1 to test your patchset Eric :)

I will see about reposting sometime soon.
Great ! thanks.

Just a mindless suggestion, the "nsopen" / "nsattach" syscall names should be
more clear no ?

Not bad suggestions.

I am going to explore a bit more.  Given that nsfd is using the same
permission checks as a proc file, I think I can just make it a proc
file.  Something like "/proc/<pid>/ns/net".  With a little luck that
won't suck too badly.
Ah ! yes. Good idea.




--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Netfitler Users]     [LARTC]     [Bugtraq]     [Yosemite Forum]

  Powered by Linux