On 21/06/10 5:58 -0700, Eric W. Biederman wrote: > Louis Rilling <Louis.Rilling@xxxxxxxxxxx> writes: > > > On 18/06/10 18:27 +0200, Oleg Nesterov wrote: > >> On 06/18, Louis Rilling wrote: > >> > > >> > On 17/06/10 23:36 +0200, Oleg Nesterov wrote: > >> > > On 06/17, Eric W. Biederman wrote: > >> > > > > >> > > > The task->children isn't changed until __unhash_process() which runs > >> > > > after flush_proc_task(). > >> > > > >> > > Yes. But this is only the current implementation detail. > >> > > It would be nice to cleanup the code so that EXIT_DEAD tasks are > >> > > never sit in ->children list. > >> > > > >> > > > So we should be able to come up with > >> > > > a variant of do_wait() that zap_pid_ns_processes can use that does > >> > > > what we need. > >> > > > >> > > See above... > >> > > > >> > > Even if we modify do_wait() or add the new variant, how the caller > >> > > can wait for EXIT_DEAD tasks? I don't think we want to modify > >> > > release_task() to do __wake_up_parent() or something similar. > >> > > >> > Indeed, I was thinking about calling __wake_up_parent() from release_task() > >> > once parent->children becomes empty. > >> > > >> > Not sure about the performance impact though. Maybe some WAIT_NO_CHILDREN flag > >> > in parent->signal could limit it. But if EXIT_DEAD children are removed from > >> > ->children before release_task(), I'm afraid that this becomes impossible. > >> > >> Thinking more, even the current do_wait() from zap_pid_ns_processes() > >> is not really good. Suppose that some none-init thread is ptraced, then > >> zap_pid_ns_processes() will hange until the tracer does do_wait() or > >> exits. > > > > Is this really a bad thing? If somebody ptraces a task in a pid namespace, that > > sounds reasonable to have this namespace (and it's init task) pinned. > > Louis. Have you seen this problem hit without my setns patch? Yes. I hit it with Kerrighed patches. I also have an ugly reproducer on 2.6.35-rc3 (see attachments). Ugly because I introduced artifical delays in release_task(). I couldn't trigger the bug without it, probably because the scheduler is too kind :) I'm using memory poisoining (SLAB and DEBUG_SLAB) to make it easy to observe the bug. Example: # ./proc_flush_task-bug-reproducer 1 > > I'm pretty certain that this hits because there are processes do_wait > does not wait for, in particular processes in a disjoint process tree. Indeed do_wait() misses EXIT_DEAD children. > > So at this point I am really favoring killing the do_wait and making > this all asynchronous. Any idea about how to do it? Thanks, Louis -- Dr Louis Rilling Kerlabs Skype: louis.rilling Batiment Germanium Phone: (+33|0) 6 80 89 08 23 80 avenue des Buttes de Coesmes http://www.kerlabs.com/ 35700 Rennes
#define _GNU_SOURCE #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <sys/types.h> #include <sys/syscall.h> #include <signal.h> #include <linux/sched.h> int pipefd[2]; int init(void *arg) { int nr, i, err; sighandler_t sigret; char c; close(pipefd[0]); err = setsid(); if (err < 0) { perror("setsid"); abort(); } sigret = signal(SIGCHLD, SIG_IGN); if (sigret == SIG_ERR) { fprintf(stderr, "signal\n"); abort(); } nr = atoi(arg); for (i = 0; i < nr; i++) { err = fork(); if (err < 0) { perror("fork"); abort(); } else if (err == 0) { printf("%d before\n", getpid()); fflush(stdout); pause(); printf("%d after\n", getpid()); fflush(stdout); return 0; } } err = write(pipefd[1], &c, 1); if (err != 1) { perror("write"); abort(); } pause(); return 0; } int main(int argc, char *argv[]) { long stack_size = sysconf(_SC_PAGESIZE); void *stack = alloca(stack_size) + stack_size; pid_t pid; char c; int ret; ret = pipe(pipefd); if (ret) { perror("pipe"); abort(); } ret = clone(init, stack, CLONE_NEWPID | SIGCHLD, argv[1]); if (ret < 0) { perror("clone"); abort(); } pid = ret; printf("%d\n", pid); fflush(stdout); close(pipefd[1]); ret = read(pipefd[0], &c, 1); if (ret != 1) { if (ret) { perror("read"); abort(); } else { sleep(5); } } printf("killing %d\n", pid); fflush(stdout); ret = kill(-pid, SIGKILL); if (ret) { perror("kill"); abort(); } return 0; }
commit 7b7cae6ae5c543b8e9cc84fc041d9bce36e7b674 Author: Louis Rilling <louis.rilling@xxxxxxxxxxx> Date: Wed Jun 16 16:20:02 2010 +0200 proc_flush_task() debug diff --git a/kernel/exit.c b/kernel/exit.c index ceffc67..be8cdb0 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -169,6 +169,14 @@ repeat: atomic_dec(&__task_cred(p)->user->processes); rcu_read_unlock(); + if (task_pid(p)->level > 0) { + if (!thread_group_leader(p) || !is_container_init(p)) { + __set_current_state(TASK_UNINTERRUPTIBLE); + schedule_timeout(10 * HZ); + } + printk("release_task: %d/%d\n", p->pid, task_pid(p)->numbers[1].nr); + } + proc_flush_task(p); write_lock_irq(&tasklist_lock);
Attachment:
signature.asc
Description: Digital signature
_______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linux-foundation.org/mailman/listinfo/containers