Daniel Lezcano wrote: > Eric W. Biederman wrote: >> Pavel Emelyanov <xemul@xxxxxxxxxxxxx> writes: >> >>> Eric W. Biederman wrote: >>>> Pavel Emelyanov <xemul@xxxxxxxxxxxxx> writes: >>>> >>>>> Eric W. Biederman wrote: >>>>>> Pavel Emelyanov <xemul@xxxxxxxxxxxxx> writes: >>>>>> >>>>>>> Thanks. What's the problem with setns? >>>>>> joining a preexisting namespace is roughly the same problem as >>>>>> unsharing a namespace. We simply haven't figure out how to do it >>>>>> safely for the pid and the uid namespaces. >>>>> The pid may change after this for sure. What problems do you know >>>>> about it? What if we try to allocate the same PID in a new space >>>>> or return -EBUSY? This will be a good starting point. If we manage >>>>> to fix it later this will not break the API at all. >>>> Parentage. The pid is the identity of a process and all kinds of things >>>> make assumptions in all kinds of strange places. I don't see how >>>> waitpid can work if you change the pid. >>> Agree. But what if we enter a pid space, which is a subnamespace of a current >>> one? In that case parent will still see the task by its old pid. We can restrict >>> first version of entering with this rule as well and this restriction will not >>> block us in typical usecase (I mean enter a container from a host). >> When I was thinking about pid namespaces and unshare last time. The idea I came >> to was we unshare of the pid namespace should only affect which pid namespace >> your children are in. >> >> I remember that do that there were a few cases where you would have to access >> task->pid->pid_ns instead of task->nsproxy->pid_ns, but essentially it was pretty >> simple. >> >>>> glibc doesn't cope if you change someones pid. >>> OK, but what if we try to allocate the same pid returning -EBUSY on failure? >>> >>> My aim is to provide even a restricted enter. For most of the cases this >>> should work and make our lives easier. So two restrictions currently: >>> a) enter a sub namespace >>> b) allocate the same pid as we have now >>> >>> Hm? :) >> Replacing struct pid is guaranteed to do all kinds of nasty things with >> signal handling and the like, de_thread is nasty enough and you are talking >> something worse. So if we can change pid namespaces without changing >> the pid I am for it. > > I agree with all the points you and Pavel you talked about but I don't > feel comfortable to have the current process to switch the pid namespace > because of the process tree hierarchy (what will be the parent of the > process when you enter the pid namespace for example). What is the > difference with the sys_bindns or the sys_hijack, proposed a couple of > years ago ? > > I did a suggestion some weeks ago about a new syscall 'cloneat' where > the child process becomes the child of the targeted process specified in > the syscall. Maybe it would be interesting to replace the 'setns' by, or > add, a 'cloneat' syscall with the file descriptor passed as parameter. > The copy_process function shall not use the nsproxy of the caller but > the one provided in the fd argument. > > The newly created process becomes the child of the process where we > retrieve the namespace with nsfd and this one have to 'waitpid' it, (the > caller of 'cloneat' can not wait it). It's a bit similar with the > CLONE_PARENT flag, except the creation order is inverted (the father > creates for the child). > > So when entering the container, we specify the pid 1 of the container > which is usually a child reaper. > > Does it make sense ? For what it's worth, I think that this suggestion (cloneat) is the so far the cleanest to allow a process to enter an existing namespace. Oren. _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linux-foundation.org/mailman/listinfo/containers