Patch 5/7 is new in this set and fixes a bug. Remaining patches are just a forward-port from previous version and I believe they address all comments I have received. Oleg please sign-off/ack if you agree. --- Container-init must behave like global-init to processes within the container and hence it must be immune to unhandled fatal signals from within the container (i.e SIG_DFL signals that terminate the process). But the same container-init must behave like a normal process to processes in ancestor namespaces and so if it receives the same fatal signal from a process in ancestor namespace, the signal must be processed. Implementing these semantics requires that send_signal() determine pid namespace of the sender but since signals can originate from workqueues/ interrupt-handlers, determining pid namespace of sender may not always be possible or safe. This patchset implements the design/simplified semantics suggested by Oleg Nesterov. The simplified semantics for container-init are: - container-init must never be terminated by a signal from a descendant process. - container-init must never be immune to SIGKILL from an ancestor namespace (so a process in parent namespace must always be able to terminate a descendant container). - container-init may be immune to unhandled fatal signals (like SIGUSR1) even if they are from ancestor namespace. SIGKILL/SIGSTOP are the only reliable signals to a container-init from ancestor namespace. Patches in this set: [PATCH 1/7] Remove 'handler' parameter to tracehook functions [PATCH 2/7] Protect init from unwanted signals more [PATCH 3/7] Add from_ancestor_ns parameter to send_signal() [PATCH 4/7] Protect cinit from unblocked SIG_DFL signals [PATCH 5/7] zap_pid_ns_process() should use force_sig() [PATCH 6/7] Protect cinit from blocked fatal signals [PATCH 7/7] SI_USER: Masquerade si_pid when crossing pid ns boundary Changelog[v8]: - Bugfix (new patch, 5/7): Nested container-init not terminated when parent container-init exits and calls zap_pid_ns_processes(). - Dropped old patch 7/7 which showed SIG_DFL signals to init as "ignored" in /proc (we were undecided on whether its good or bad). Changelog[v7]: - siginfo_from_user() and siginfo_from_ancestor_ns() are fairly simple and used only in send_signal(). Remove them and move the logic into send_signal() (Patch 4/7) - Update /proc/pid/status to include SIG_DFL signals to init in the "ignored" set (and remove the TODO in Patch 0/7) (Patch 7/7) Changelog[v6]: - Patches 3,4: Have kill_pid_info_as_uid() pass in 'from_ancestor_ns' parameter to __send_signal() and remove SI_ASYNCIO check in siginfo_from_user(). - Patches 4,6: Update changelog and simplify code Changelog[v5]: - Patch 2/6: Remove SIG_IGN check in sig_task_ignored() and let sig_handler_ignored() check SIG_IGN. - Patch 3/6. Put siginfo_from_ancestor_ns() back under CONFIG_PID_NS and remove warning in rt_sigqueueinfo(). - (Patch 5/6)Simplify check in get_signal_to_deliver() - (Patch 6/6)Simplify masquerading pid - LTP-20081219-intermediate showed no new errors on 2.6.28-rc5-mm2. Changelog[v4]: - [Bugfix] Patch 3/7. Check ns == NULL in siginfo_from_ancestor_ns(). Although http://lkml.org/lkml/2008/12/16/502 makes it less likely that ns == NULL, looks like an explicit check won't hurt ? - Remove SIGNAL_UNKILLABLE_FROM_NS flag and simplify logic as suggested by Oleg Nesterov. - Dropped patch that set SIGNAL_UNKILLABLE_FROM_NS and set SIGNAL_UNKILLABLE in patch 5/7 to be bisect-safe. - Add a warning in rt_sigqueueinfo() if SI_ASYNCIO is used (patch 3/7) - Added two patches (6/7 and 7/7) to masquerade si_pid for SI_USER and SI_TKILL Changelog[v3]: Changes based on discussions of previous version: http://lkml.org/lkml/2008/11/25/458 Major changes: - Define SIGNAL_UNKILLABLE_FROM_NS and use in container-inits to skip fatal signals from same namespace but process SIGKILL/SIGSTOP from ancestor namespace. - Use SI_FROMUSER() and si_code != SI_ASYNCIO to determine if it is safe to dereference pid-namespace of caller. Highly experimental :-) - Masquerading si_pid when crossing namespace boundary: relevant patches merged in -mm and dropped from this set. Minor changes: - Remove 'handler' parameter to tracehook functions - Update sig_ignored() to drop SIG_DFL signals to global init early (tried to address Roland's and Oleg's comments) - Use 'same_ns' flag to drop SIGKILL/SIGSTOP to cinit from same namespace Limitations/side-effects of current design - Container-init is immune to suicide - kill(getpid(), SIGKILL) is ignored. Use exit() :-) _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linux-foundation.org/mailman/listinfo/containers