On Sun, 25 Jul 2021 13:07:21 -0500, Ian Pilcher said: > Is there any sort of convention around what to return in the case of an > error in the logic of the code itself, something that will make it as > obvious as possible that the problem is a bug. In general, there's no good way to signal such issues back to userspace, because there's no reserved '-EHIT_A_BUG' value. This has been true for decades, ever since Unix was still on the 18-bit PDP-7 in 1969. And it's basically useless to return such a value to userspace, because there's nothing useful that userspace can *do* in such a case. Note that pretty much all the defined error codes refer back to things that userspace could at least potentially do something useful - it may retry the operation after a delay, or tell the user that an optional facility isn't available in the currently running kernel, or try an alternate method of doing an operation (for example, trying again with IPv4 if an IPv6 connection fails, or use a different method of file locking). But if a userspace process hits an actual kernel bug, what is it supposed to do to recover? Do you add "check for -EHIT_A_BUG' to every single place you do a syscall? After all, 98% of userspace code is, *at best*, going to simply do an 'if (!erro)' test. And userspace code only does a more detailed check of *which* errno it got handed if it can do something different/useful for a specific code (such as code that goes into a retry loop if it gets -EEXIST when trying to create a lock file that shouldn't exist, and should be removed by another process when it's done). In particular, userspace has no ability to log any useful debugging information. There's the additional issue that the actual problem may not even be in that syscall's code - it could be some previous syscall from the current process that mis-set something in a structure, or some other kernel thread doing a write-after-free and corrupting memory, or code that assumed that it wouldn't be rescheduled to another CPU, or a myriad of other ways to fail. The *proper* thing to do is, instead of deciding to return -EHIT_A_BUG, do a WARN(), or BUG(), so that the dmesg has something that's at least potentially useful. Then use that information to fix the issue.
Attachment:
pgpxSMjlEqIvz.pgp
Description: PGP signature
_______________________________________________ Kernelnewbies mailing list Kernelnewbies@xxxxxxxxxxxxxxxxx https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies