On Tue, Nov 08, 2022 at 11:38:22AM -0800, Kees Cook wrote: > On Tue, Nov 08, 2022 at 09:24:40AM -0800, Kees Cook wrote: > > On Mon, Nov 07, 2022 at 10:48:20PM +0100, Jann Horn wrote: > > > On Mon, Nov 7, 2022 at 10:15 PM Solar Designer <solar@xxxxxxxxxxxx> wrote: > > > > On Mon, Nov 07, 2022 at 09:13:17PM +0100, Jann Horn wrote: > > > > > +oops_limit > > > > > +========== > > > > > + > > > > > +Number of kernel oopses after which the kernel should panic when > > > > > +``panic_on_oops`` is not set. > > > > > > > > Rather than introduce this separate oops_limit, how about making > > > > panic_on_oops (and maybe all panic_on_*) take the limit value(s) instead > > > > of being Boolean? I think this would preserve the current behavior at > > > > panic_on_oops = 0 and panic_on_oops = 1, but would introduce your > > > > desired behavior at panic_on_oops = 10000. We can make 10000 the new > > > > default. If a distro overrides panic_on_oops, it probably sets it to 1 > > > > like RHEL does. > > > > > > > > Are there distros explicitly setting panic_on_oops to 0? If so, that > > > > could be a reason to introduce the separate oops_limit. > > > > > > > > I'm not advocating one way or the other - I just felt this should be > > > > explicitly mentioned and decided on. > > > > > > I think at least internally in the kernel, it probably works better to > > > keep those two concepts separate? For example, sparc has a function > > > die_nmi() that uses panic_on_oops to determine whether the system > > > should panic when a watchdog detects a lockup. > > > > Internally, yes, the kernel should keep "panic_on_oops" to mean "panic > > _NOW_ on oops?" but I would agree with Solar -- this is a counter as far > > as userspace is concerned. "Panic on Oops" after 1 oops, 2, oopses, etc. > > I would like to see this for panic_on_warn too, actually. > > Hm, in looking at this more closely, I think it does make sense as you > already have it. The count is for the panic_on_oops=0 case, so even in > userspace, trying to remap that doesn't make a bunch of sense. So, yes, > let's keep this as-is. I don't follow your logic there - maybe you got confused? Yes, as proposed the count is for panic_on_oops=0, but that's just weird - first kind of request no panic with panic_on_oops=0, then override that with oops_limit=10000. I think it is more natural to request panic_on_oops=10000 in one step. Also, I think it is more natural to preserve panic_on_oops=0's meaning of no panic on Oops. To me, about the only reason to introduce the override is if we want to literally override a distro's explicit default of panic_on_oops=0. Alexander