On Tue, Sep 10, 2019 at 9:57 AM Andy Lutomirski <luto@xxxxxxxxxx> wrote: > > On Wed, Sep 4, 2019 at 5:53 PM Daniel Colascione <dancol@xxxxxxxxxx> wrote: > > > > A task with CAP_SYS_ADMIN can mark itself PR_SET_TASK_CRITICAL, > > meaning that if the task ever exits, the kernel panics. This facility > > is intended for use by low-level core system processes that cannot > > gracefully restart without a reboot. This prctl allows these processes > > to ensure that the system restarts when they die regardless of whether > > the rest of userspace is operational. > > The kind of panic produced by init crashing is awful -- logs don't get > written, etc. True today --- but that's a separate problem, and one that can be solved in a few ways, e.g., pre-registering log buffers to be incorporated into any kexec kernel memory dumps. If a system aiming for reliability can't diagnose panics, that's a problem with or without my patch. > I'm wondering if you would be better off with a new > watchdog-like device that, when closed, kills the system in a > configurable way (e.g. after a certain amount of time, while still > logging something and having a decent chance of getting the logs > written out.) This could plausibly even be an extension to the > existing /dev/watchdog API. There are lots of approaches that work today: a few people have suggested just having init watch processes, perhaps with pidfds. What I worry about is increasing the length (both in terms of time and complexity) of the critical path between something going wrong in a critical process and the system getting back into a known-good state. A panic at the earliest moment we know that a marked-critical process has become doomed seems like the most reliable approach, especially since alternatives can get backed up behind things like file descriptor closing and various forms of scheduling delay.