On Jul 29, 2014 10:57 PM, "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx> wrote: > > Andy Lutomirski <luto@xxxxxxxxxxxxxx> writes: > > > On Tue, Jul 29, 2014 at 9:08 PM, Eric W. Biederman > > <ebiederm@xxxxxxxxxxxx> wrote: > >> Andy Lutomirski <luto@xxxxxxxxxxxxxx> writes: > >> > >>> On Mon, Jul 28, 2014 at 2:18 PM, Eric W. Biederman > >>> <ebiederm@xxxxxxxxxxxx> wrote: > >>>> Andy Lutomirski <luto@xxxxxxxxxxxxxx> writes: > >>>> > >>>>> [cc: Eric Biederman] > >>>>> > >>>> > >>>>> Can we do one better and add a flag to prevent any non-self pid > >>>>> lookups? This might actually be easy on top of the pid namespace work > >>>>> (e.g. we could change the way that find_task_by_vpid works). > >>>>> > >>>>> It's far from just being signals. There's access_process_vm, ptrace, > >>>>> all the signal functions, clock_gettime (see CPUCLOCK_PID -- yes, this > >>>>> is ridiculous), and probably some others that I've forgotten about or > >>>>> never noticed in the first place. > >>>> > >>>> So here is the practical question. > >>>> > >>>> Are these processes that only can send signals to their thread group > >>>> allowed to call fork()? > >>>> > >>>> > >>>> If fork is allowed and all pid lookups are restricted to their own > >>>> thread group that wait, waitpid, and all of the rest of the wait family > >>>> will never return the pids of their children, and zombies will > >>>> accumulate. Aka the semantics are fundamentally broken. > >>> > >>> Good point. > >>> > >>> I can imagine at least three ways that fork() could continue working, though: > >>> > >>> 1. Allow lookups of immediate children, too. (I don't love this one.) > >>> 2. Allow non-self pids to be translated in but not out. This way > >>> P_ALL will continue working. > >>> 3. Have the kernel treat any PID-restricted process as though it were NOCLDWAIT. > >>> > >>> I think I like #3. Thoughts? > >>> > >>>> > >>>> If fork is not allowed pid namespaces already solve this problem. > >>> > >>> PID namespaces are fairly heavyweight. Julien pointed out that using > >>> PID namespaces requires a bunch of dummy PID 1 processes. > >> > >> Only if you can't tolerate init exiting. The reasoning with respect to > >> signals and signals being ignored was wrong. And if you only have one > >> process you care about and no children to worry about neither the > >> difference in signal handling nor the world dies whe init exits applies. > > > > Can you elaborate? It seems entirely plausible to me that there are > > programs that won't work right as PID 1 without considerable > > adaptation. > > The only funny things about pid 1 of a pid namespace are: > - children can't send signals to pid 1 unless a signal handler has > been established. > - All children die when the parent dies. > - Grand children become zombies of the parent when the children die. > - The pid is 1. > > That is almost everything is the same and it takes almost no adaptation > (really) to run as the initial pid in a pid namespace. > > Not being able to receive signals (which is the argument I read against > them) is bogus. You just have to set your signal handler to something > besides SIG_DFL. > > So I have my question: What is the use case people are trying to solve > by filtering signals and pid lookups. If children are not part of the > goal a pid namespace will work just fine. > > >> Therefore given what I have read described pid namespaces are a trivial > >> solution to this problem space. > > > > pid namespaces also won't work in the context of Capsicum unless you > > want every single Capsicum process to be its own pid namespace. > > For a tightly bound process I don't see why each process could not be > it's own pid namespace. Two main reasons: You can't put yourself in a pid namespace, so you need to fork into your sandbox, and you can't prevent yourself from seeing your children (although, as noted, my approach has issues here, too, but I think this is more easily solved outside the context of namespaces). > > > Also, > > pid namespaces don't offer any way to protect children from parents. > > And my presumption was that there were not any children because the > semantics suggested so far do not properly support children. > I'd like to try to fix that. Another approach: let waiting for zombies that are immediate children be an exception. --Andy > Eric -- To unsubscribe from this list: send the line "unsubscribe linux-api" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html