On Wed, 9 Aug 2023, Tiezhu Yang wrote: > > So you want to keep a task alive that has caused a kernel oops in the > > process context in this case, right? What purpose would it be for and > > what condition causes `notify_die' to return NOTIFY_STOP? IOW why is > > there no need to call `make_task_dead' in this case? > > I did some research, hope it is useful. > > There is a related description in Documentation/input/notifier.rst: > > For each kind of event but the last, the callback may return > NOTIFY_STOP in order to "eat" the event: the notify loop is > stopped and the keyboard event is dropped. I saw that, but this is irrelevant. Dropping a keyboard event won't make the system unstable (though it can make a console user unstable, out of irritation). > In commit 748f2edb5271 ("x86 NMI: better support for debuggers"), it said: > > If the notify is handled with a NOTIFY_STOP return, the > system is given a new lease on life. > > In commit 004429956b48 ("handle recursive calls to bust_spinlocks()"), > it said: > > However, at least on i386 die() has been capable of returning > (and on other architectures this should really be that way, too) > when notify_die() returns NOTIFY_STOP. > > In commit 22f5991c85de ("x86-64: honor notify_die() returning NOTIFY_STOP"), > it said: > > This requires making die() return a value, making its callers honor > this (and be prepared that it may return) > > In commit 620de2f5dc69 ("[IA64] honor notify_die() returning NOTIFY_STOP"), > it said: > > This requires making die() and die_if_kernel() return a value, > and their callers to honor this (and be prepared that it returns). Thanks, that indeed helps, though indirectly. I think the most relevant, though still terse explanation comes from commit 20c0d2d44029 ("[PATCH] i386: pass proper trap numbers to die chain handlers"), which I believe is the earliest of similar changes. The patch was originally submitted here: <https://lore.kernel.org/r/43DDF02E.76F0.0078.0@xxxxxxxxxx/> and hardly any discussion emerged, but I think the key statement is: "[...] honor the return value from the handler chain invocation in die() as, through a debugger, the fault may have been fixed." Now it makes sense to me: even if ignoring the event will make the system unstable, by allowing access through a debugger it has been compromised already anyway. So I think your change will be good if you update the change description to include the justification quoted above rather than just: "the others do it too, so it must be good" (though you can of course mention that your change also makes our port consistent with other ones). I suggest linking to the original i386 submission too for future reference. Also I note that you combine three independent changes into one, so please split it into individual patches as per our requirements. Maciej