Hi Valdis, thanks for the thorough response. El vie., 18 oct. 2019 a las 18:53, Valdis Klētnieks (<valdis.kletnieks@xxxxxx>) escribió: > Well..here's the thing. Unless you have "panic_on_oops" set, hitting a null > pointer will usually *NOT* panic the whole system. In fact, that #0000 in the > panic message is a counter of how many times the kernel has OOPs'ed already. > Way back in the dark mists of time, I had a system that managed to get it up to > #1500 or so overnight. Yes, and this is why my horribly hackish way to fix things is to manually tamper with panic_on_oops on a die_notifier. I was hoping to find a way not to do this. > The most graceful generic thing the kernel can do at that point is kill the execution > thread that hit the error. This can quickly go sideways if that thread held a lock > or similar critical resource. And no, even though the kernel knows all the locks > the thread had, it *does not* know which ones, if any, are safe to unlock. I'd rather have the kernel just return control to me, at the beginning of the catch block, and give me a chance to fix things (or at least log some debugging info). I imagine that's what Windows' __except block is for. The kernel may not know which locks are safe to break, but I do. Whether a kernel left in an unstable state is less desirable than a panic is debatable in a case-by-case basis, and IMHO outside the scope of this discussion. > And if you actually *think* about it - a 'try/catch' is semantically *identical* to > coding a parameter test before the event or checking a return code after. I humbly disagree. Return codes aren't possible in all cases, which is why there are things like native_read_msr_safe which implement some form of exception handling through _ASM_EXTABLE. > Also - say you have a try/catch around a statement. For some exceptions, such > as an end-of-file or a dropped network connection, it's reasonably easy to know > how to clean up and continue. But what if the statement hits a null pointer > error. What do you do to clean things up? You have a bad pointer, and you > have *no way to actually fix it and continue normally*. But then I can choose to let my process die, plus log some useful info and maybe even do some minor cleanups, without raising a panic. My particular module just reads some hardware registers and returns the info to userspace, so it's not something essential for the system. As a user, I would hate it if a non-essential module crashes the whole system like that. Perhaps the real problem is that panic_on_oops affects all of the kernel, rather than a given module. In any case, I think I already have my answer. Thanks for the response & discussion. _______________________________________________ Kernelnewbies mailing list Kernelnewbies@xxxxxxxxxxxxxxxxx https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies