On Do, 30.05.24 14:18, Jiri Olsa (olsajiri@xxxxxxxxx) wrote: > > It seems that the bpf code in the kernel sometimes leaks > > kernel-internal error codes, i.e. those from include/linux/errno.h > > into userspace (as opposed to those from > > include/uapi/asm-generic/errno.h which are public userspace facing > > API). > > > > According to the comments from that internal header file: "These > > should never be seen by user programs." > > > > Specifically, this is about ENOTSUPP, which userspace simply cannot > > handle, there's no error 524 defined in glibc or anywhere else. > > > > We ran into this in systemd recently: > > > > https://github.com/systemd/systemd/issues/32170#issuecomment-2076928761 > > > > (a google search reveals others were hit by this too) > > > > We commited a work-around for this for now: > > > > https://github.com/systemd/systemd/pull/33067 > > > > But it really sucks to work around this in userspace, this is a kernel > > internal definition after all, conflicting with userspace (where > > ENOTSUPP is just an alias for EOPNOTSUPP), hence not really fixable. > > > > ENOSUPP is kinda useless anyway, since EOPNOTSUPP is pretty much > > equally expressive, and something userspace can actually handle. > > > > Various kernel subsystems have been fixed over the years in similar > > situations. For example: > > > > https://patchwork.kernel.org/project/linux-wireless/patch/20231211085121.3841b71c867d.Idf2ad01d9dfe8d6d6c352bf02deb06e49701ad1d@changeid/ > > > > or > > > > https://patchwork.kernel.org/project/linux-media/patch/af5b2e8ac6695383111328267a689bcf1c0ecdb1.1702369869.git.sean@xxxxxxxx/ > > > > or > > > > https://patchwork.ozlabs.org/project/linux-mtd/patch/20231129064311.272422-1-acelan.kao@xxxxxxxxxxxxx/ > > > > I think BPF should really fix that, too. > > hm, I don't think we can change that, user space already depends > on those values and we'd break it with new value Are you sure about that? To be able to handle this situation that userspace program whose existance you are indicating would have had to go the extra mile to literally handle error code 524 that is not known to userspace otherwise and handle it. If somebody goes the extra mile to do that, what makes you think that they didn't just handle it as equivalent to regular EOPNOSTUPP? In systemd at least that's what we are doing. Also: if various other subsystems (I linked examples from wireless, media, mtd above) just fixed this, why not bpf, why is it special in this regard? AFAICS the man pages of most syscalls just list errnos that *can* be returned, but usually doesn't list the precise conditions or makes guarantees about it. This specific error is not listed at all on any man page for the bpf() syscall, hence are you really sure this is actually as set in stone as you think it is? I mean, bpf() is still a bit of bleeding edge tech, and people playing around with this probably tend to have quite new toolchains even, not old, hard to fix code? > it's unfortunate, but I don't think we can do much about that, > other than enforcing EOPNOTSUPP for new code I took the liberty to CC Linus on this: Linus, what's the policy if some subsystem by mistake is leaking internal kernel error codes (such as ENOTSUP) to userspace? Leave it be as is (i.e. error number not defined in include/uapi/, not documented, but still returned), or fix it to the closest matching public error code (which is probably EOPNOTSUPP in this case) accepting a – mild, I would say – compat break? [BTW, wouldn't it make sense to add a BUG_ON or so on syscalls that return an error number > 133 or so? This kind of issue is quite a recurring theme, see the patches above, and such a BUG_ON would probably catch 95% of all cases like this.] Lennart