On Mon, Jul 17, 2023 at 7:10 PM Alejandro Colomar <alx@xxxxxxxxxx> wrote: > Hi Matthew, > > I don't understand what's the purpose of this. The kernel sets a bit > just to report to the caller that it set a bit? No other purpose? > It feels very weird. Of course, the caller already has that info, > doesn't it? The main reason I posted this patch was because I was confused by the flag's presence in the msg_flags when I was looking at some strace logs, so I figured that it would be a good idea to document it. As for the original purpose of the behavior, it's not really clear, and it may well have been an implementation artifact that got enshrined in the user space ABI. (Even io_uring is careful to replicate this behavior!) This behavior began when the MSG_CMSG_CLOEXEC flag was first added in Linux 2.6.23, with Ulrich Drepper's commit 4a19542e5f69 ("O_CLOEXEC for SCM_RIGHTS"). Per the commit message, the flag was designed to be "passe[d]... just like the existing MSG_CMSG_COMPAT flag". Since it was added to the msg_flags at the start of sys_recvmsg(), the scm_detach_fds[_compat]() functions in net/core/scm.c and net/compat.c could read the flag off of msg->msg_flags without having to thread the recvmsg() flags through. This was indeed similar to the behavior of MSG_CMSG_COMPAT. That flag was added in Linux 2.5.65, with commit 3225fc8a85f4 ("[NET]: Simplify scm handling and sendmsg/recvmsg invocation, consolidate net compat syscalls."), in which put_cmsg() and scm_detach_fds() in net/core/scm.c read it off of msg->msg_flags. (It wouldn't actually be set in msg_flags until Linux 2.5.67, with commit 7e8d06bc1d90, "[COMPAT]: Fix MSG_CMSG_COMPAT flag passing, kill cmsg_compat_recvmsg_fixup." Both of these commits are from history/history.git.) However, the MSG_CMSG_COMPAT flag has been scrubbed from the output msg_flags since Linux 2.6.14, with commit 37f7f421cce1 ("[NET]: Do not leak MSG_CMSG_COMPAT into userspace."). That's what I find so unclear: MSG_CMSG_CLOEXEC was added after the kernel started scrubbing MSG_CMSG_COMPAT from the output, but the new flag was never written to be similarly scrubbed. Later, in Linux 3.10, with commits 1be374a0518a ("net: Block MSG_CMSG_COMPAT in send(m)msg and recv(m)msg") and a7526eb5d06b ("net: Unbreak compat_sys_{send,recv}msg"), MSG_CMSG_COMPAT was banned from being passed to the *msg() syscalls' flags from user space, with the rationale that they were "not intended to be part of the API". Then, in Linux 4.0, we reached the current status quo with commit d720d8cec563 ("net: compat: Ignore MSG_CMSG_COMPAT in compat_sys_{send, recv}msg"), where MSG_CMSG_COMPAT is allowed (and a no-op) in compat syscalls, but banned from non-compat syscalls. So I agree that it's very weird that this flag gets returned to user space, even while the internal flag that it's modeled after doesn't. I suppose I could spin up a nice story, where the user-space function calling recvmsg() is totally separate from the function processing the returned struct msghdr, and the latter function would really like to know whether the fds in that message are close-on-exec without having to call fcntl(F_GETFD). But that's all just a total guess. If you want to know for sure, perhaps cc'ing Drepper may be worthwhile? A cursory look hasn't shown me any existing user-space code that depends on this behavior. Though one library appears to be aware of this behavior, actively filtering MSG_CMSG_CLOEXEC out of the result flags: <https://github.com/dutchanddutch/node-socket-calls/blob/ca759a0da87cb112875d158f4a81b45b31f4a871/src/socket_calls.cc#L417> Also, only somewhat relatedly, some libraries incorrectly attempt to request MSG_CMSG_CLOEXEC by passing it into the msg_flags field instead of the flags argument: <https://git.samba.org/samba.git/?p=samba.git;a=blob;f=lib/messaging/messages_dgm.c;hb=refs/tags/samba-4.17.9#l1272> <https://github.com/genodelabs/genode/blob/23.05/repos/base-linux/src/lib/base/ipc.cc#L132> <https://github.com/proxmox/pve-lxc-syscalld/blob/a14430f3e75c2b695332ad712164e599464177fc/src/io/seq_packet.rs#L123> Thank you, Matthew House