Daniel Lezcano [dlezcano@xxxxxxxxxx] wrote: > Sukadev Bhattiprolu wrote: >> Still digging through some traces, but below I have some questions that >> I am still trying to answer. >> >>> I am not sure what you mean by 'struct pids' but what I observed is: >> >> Ok, I see that too. If pids leak, then pid-namespace will leak too. >> Do you see any leaks in proc_inode_cache ? > > Yes, right. It leaks too. Ok, some progress... Can you please verify these observations: - If the container exits normally, the leak does not seem to happen. (i.e reduce your sleep 3600 to say sleep 3 and remove the lxc-stop). - Revert the following commit and check if the leak happens: commit 7766755a2f249e7e0dabc5255a0a3d151ff79821 Author: Andrea Arcangeli <andrea@xxxxxxx> Date: Mon Feb 4 22:29:21 2008 -0800 (this commit added the check for PF_EXITING in proc_flush_task_mnt loosely explained below). Incomplete analysis :-) If the container-init is terminated (by the lxc-stop), the container zaps other processes in the container and waits for them. The leak happens in this case. Following sequence of events occur: - container-init calls do_exit and sets PF_EXITING (in exit_signals()) - container init calls zaps_pid_ns_processes() (exit_notify / forget_orignal_parent() / find_new_reaper()) - In zap_pid_ns_processes() container-init sends SIGKILL to descendants and calls sys_wait(). - The sys_wait() is expected to call release_task() which calls proc_flush_task_mnt(). - proc_flush_task_mnt() looks up the dentry for the pid (2 in our example) and finds the dentry. But since container-init is itself exiting (i.e PF_EXITING is set) it does NOT call the shrink_dcache_parent(), but, interestingly calls d_drop() and dput(). Now the d_drop() unhashes the dentry for the pid 2. - proc_flush_task_mnt() then tries to find the dentry for the tgid of the process. In our case, the tgid == pid == 2 and we just unhashed the dentry for "2". So, we don't find the dentry for the leader either (and hence don't make the second shrink_dcache_parent() call in proc_flush_task_mnt() either). Without a call to shrink_dcache_parent(), the proc inode for the process that was terminated by container init is not deleted (i.e we don't call proc_delete_inode() or the put_pid() inside it) causing us to leak proc_inodes, struct pid and hence struct pid_namespace. There should be a better fix, but first please confirm if reverting the above commit fixes the leak for you also. Sukadev _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linux-foundation.org/mailman/listinfo/containers