Re: [PATCH v2] recv.2: Document MSG_CMSG_CLOEXEC as returned in msg_flags

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jul 17, 2023 at 7:10 PM Alejandro Colomar <alx@xxxxxxxxxx> wrote:
> Hi Matthew,
>
> I don't understand what's the purpose of this.  The kernel sets a bit
> just to report to the caller that it set a bit?  No other purpose?
> It feels very weird.  Of course, the caller already has that info,
> doesn't it?

The main reason I posted this patch was because I was confused by the
flag's presence in the msg_flags when I was looking at some strace logs, so
I figured that it would be a good idea to document it. As for the original
purpose of the behavior, it's not really clear, and it may well have been
an implementation artifact that got enshrined in the user space ABI. (Even
io_uring is careful to replicate this behavior!)

This behavior began when the MSG_CMSG_CLOEXEC flag was first added in Linux
2.6.23, with Ulrich Drepper's commit 4a19542e5f69 ("O_CLOEXEC for
SCM_RIGHTS"). Per the commit message, the flag was designed to be
"passe[d]... just like the existing MSG_CMSG_COMPAT flag". Since it was
added to the msg_flags at the start of sys_recvmsg(), the
scm_detach_fds[_compat]() functions in net/core/scm.c and net/compat.c
could read the flag off of msg->msg_flags without having to thread the
recvmsg() flags through.

This was indeed similar to the behavior of MSG_CMSG_COMPAT. That flag was
added in Linux 2.5.65, with commit 3225fc8a85f4 ("[NET]: Simplify scm
handling and sendmsg/recvmsg invocation, consolidate net compat
syscalls."), in which put_cmsg() and scm_detach_fds() in net/core/scm.c
read it off of msg->msg_flags. (It wouldn't actually be set in msg_flags
until Linux 2.5.67, with commit 7e8d06bc1d90, "[COMPAT]: Fix
MSG_CMSG_COMPAT flag passing, kill cmsg_compat_recvmsg_fixup." Both of
these commits are from history/history.git.)

However, the MSG_CMSG_COMPAT flag has been scrubbed from the output
msg_flags since Linux 2.6.14, with commit 37f7f421cce1 ("[NET]: Do not leak
MSG_CMSG_COMPAT into userspace."). That's what I find so unclear:
MSG_CMSG_CLOEXEC was added after the kernel started scrubbing
MSG_CMSG_COMPAT from the output, but the new flag was never written to be
similarly scrubbed.

Later, in Linux 3.10, with commits 1be374a0518a ("net: Block
MSG_CMSG_COMPAT in send(m)msg and recv(m)msg") and a7526eb5d06b ("net:
Unbreak compat_sys_{send,recv}msg"), MSG_CMSG_COMPAT was banned from being
passed to the *msg() syscalls' flags from user space, with the rationale
that they were "not intended to be part of the API". Then, in Linux 4.0, we
reached the current status quo with commit d720d8cec563 ("net: compat:
Ignore MSG_CMSG_COMPAT in compat_sys_{send, recv}msg"), where
MSG_CMSG_COMPAT is allowed (and a no-op) in compat syscalls, but banned
from non-compat syscalls.

So I agree that it's very weird that this flag gets returned to user space,
even while the internal flag that it's modeled after doesn't. I suppose I
could spin up a nice story, where the user-space function calling recvmsg()
is totally separate from the function processing the returned struct
msghdr, and the latter function would really like to know whether the fds
in that message are close-on-exec without having to call fcntl(F_GETFD).
But that's all just a total guess. If you want to know for sure, perhaps
cc'ing Drepper may be worthwhile?

A cursory look hasn't shown me any existing user-space code that depends on
this behavior. Though one library appears to be aware of this behavior,
actively filtering MSG_CMSG_CLOEXEC out of the result flags:
<https://github.com/dutchanddutch/node-socket-calls/blob/ca759a0da87cb112875d158f4a81b45b31f4a871/src/socket_calls.cc#L417>

Also, only somewhat relatedly, some libraries incorrectly attempt to
request MSG_CMSG_CLOEXEC by passing it into the msg_flags field instead of
the flags argument:
<https://git.samba.org/samba.git/?p=samba.git;a=blob;f=lib/messaging/messages_dgm.c;hb=refs/tags/samba-4.17.9#l1272>
<https://github.com/genodelabs/genode/blob/23.05/repos/base-linux/src/lib/base/ipc.cc#L132>
<https://github.com/proxmox/pve-lxc-syscalld/blob/a14430f3e75c2b695332ad712164e599464177fc/src/io/seq_packet.rs#L123>

Thank you,
Matthew House



[Index of Archives]     [Kernel Documentation]     [Netdev]     [Linux Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux