Hi! On Tue, Nov 15, 2016 at 07:11:54PM +0100, David Henningsson wrote: > Interesting bug :-) > > Just so I get this right - this could just as well be fixed with instead > changing msg_controllen to this: > > mh.msg_controllen = CMSG_SPACE(sizeof(int) * nfd); > Yup, your understanding is correct :-) > That said, your version works equally well and is slightly cleaner. I don't > recall we have any problems with VLAs. > Yup. Also if one day the code will be modified to attach more than one ancillary object, then above "mh.msgcontrollen" equation can risk data of the first object (cmsg_data[]) flowing over meta data (cmsghdr) of the second one. > Perhaps adding a comment to your finding and a registered kernel > bug/patch/etc would be helpful. > Yeah; I'll send a kernel patch soon. Nonetheless, the kernel is not entirely at fault here. It just has a more permissive side for invalid cmsg params on the native part, and a less permissive one due to a pointer arithmetic issue on the 32-bit emulated one - creating incompatibilities like in our case. (If it were me, I would make both ends strict in parsing cmsg() paramaters, but that has the potential of breaking many existing apps - so it would be better to just make 32-bit emulation as permissive.) To test that hypthesis, set MAX_ANCIL_DATA_FDS to 5 instead of 2. The same issue will be instantaneously reproduced for pure 64-bit apps. Kernel's internal CMSG_NXTHDR() will find the empty space "big enough" to be valid CMSG, and will consider it a _second_ ancil object. sendmsg() code will find that second ancil object has a cmsg_len = 0, directly also returning -EINVAL. > Anyhow, good work, so: > thanks a lot ;-) > Acked-by: David Henningsson <diwic at ubuntu.com> > > > On 2016-11-15 17:48, Ahmed S. Darwish wrote: > > Users reported audio breakage for 32-bit pulse clients connected > > to a 64-bit server over memfds. Investigating the issue further, > > the problem is twofold: > > > > 1. iochannel's file-descriptor passing code is liberal in what it > > issues: produced ancillary data object's "data" section exceeds > > length field. How such an extra space is handled is a grey area > > in the POSIX.1g spec, the IETF RFC #2292 "Advanced Sockets API > > for IPv6" memo, and the cmsg(3) manpage. > > > > 2. A 64-bit kernel handling of such extra space differs by whether > > the app is 64-bit or 32-bit. For 64-bit apps, the kernel > > smartly ducks the issue. For 32-bit apps, an -EINVAL is > > directly returned; that's due to a kernel CMSG header traversal > > bug in the networking stack "32-bit sockets emulation layer". > > > > Compare Linux Kernel's socket.h cmsg_nxthdr() code and the > > 32-bit emulation layer version of it at net/compat.c > > cmsg_compat_nxthdr() for further info. Notice how the former > > graciously ignores incomplete CMSGs while the latter _directly_ > > complains about them -- as of kernel version 4.9-rc5. > > > > (A kernel patch is to be submitted) > > > > Details: > > > > iochannel typically uses sendmsg() for passing FDs & credentials. > > From RFC 2292, sendmsg() control data is just a heterogeneous > > array of embedded ancillary objects that can differ in length. > > Linguistically, a "control message" is an ancillary data object. > > > > For example, below is a sendmsg() "msg_control" containing two > > ancillary objects: > > > > |<---------------------- msg_controllen---------------------->| > > | | > > |<--- ancillary data object -->|<----- ancillary data object->| > > |<------- CMSG_SPACE() ------->|<------- CMSG_SPACE() ------->| > > | | | > > |<-------- cmsg_len ------->| |<-------- cmsg_len ------->| | > > |<------- CMSG_LEN() ------>| |<------- CMSG_LEN() ------>| | > > | | | | | > > +-----+-----+-----+--+------+--+-----+-----+-----+--+------+--+ > > |cmsg_|cmsg_|cmsg_|XX|cmsg_ |XX|cmsg_|cmsg_|cmsg_|XX|cmsg_ |XX| > > |len |level|type |XX|data[]|XX|len |level|type |XX|data[]|XX| > > +-----+-----+-----+--+------+--+-----+-----+-----+--+----+-+--+ > > ^^^^^^^ Ancil Object #1 ^^^^^^^ Ancil Object #2 > > (control message) (control message) > > ^ > > | > > +--- sendmsg() "msg_control" points here > > > > Problem is, while passing FDs, iochannel's code try to avoid > > variable-length arrays by creating a single cmsg object that can > > fit as much FDs as possible: > > > > union { > > struct cmsghdr hdr; > > uint8_t data[CMSG_SPACE(sizeof(int) * MAX_ANCIL_DATA_FDS)]; > > } cmsg; ^^^^^^^^^^^^^^^^^^ > > > > Most of the time though the number of FDs to be passed is less > > than the maximum above, thus "cmsg_len" is set to the _actual_ FD > > array size: > > > > cmsg.hdr.cmsg_len = CMSG_LEN(sizeof(int) * nfd); > > ^^^ > > This inconsistency tricks the kernel into thinking that we have 2 > > ancillay data objects instead of one! First cmsg is valid as > > intended, but the second is instantly _corrupt_ since it has a > > cmsg_len size of 0 -- thus failing kernel's CMSG_OK() tests. > > > > For 32-bit apps on a 32-bit kernel, and 64-bit apps over a 64-bit > > one, the kernel's own CMSG header traversal macros just ignore the > > second "incomplete" cmsg. For 32-bit apps over a 64-bit kernel > > though, the kernel 32-bit socket emulation macros does not forgive > > such incompleteness and directly complains of invalid args (due to > > a subtle bug). > > > > Avoid this ugly problem, which can also bite us in a pure 64-bit > > environment if MAX_ANCIL_DATA_FDS got extended to 5 FDs, by > > setting "cmsg_data[]" array size to "cmsg_len". > > > > BugLink: https://bugs.freedesktop.org/show_bug.cgi?id=97769 > > > > Signed-off-by: Ahmed S. Darwish <darwish.07 at gmail.com> > > --- > > > > Hopefully variable-length arrays won't be problematic in esoteric > > build environments? > > > > src/pulsecore/iochannel.c | 4 +++- > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > diff --git a/src/pulsecore/iochannel.c b/src/pulsecore/iochannel.c > > index e62750b..8ace297 100644 > > --- a/src/pulsecore/iochannel.c > > +++ b/src/pulsecore/iochannel.c > > @@ -346,6 +346,8 @@ ssize_t pa_iochannel_write_with_creds(pa_iochannel*io, const void*data, size_t l > > return r; > > } > > > > +/* For more details on FD passing, check the cmsg(3) manpage > > + * and IETF RFC #2292: "Advanced Sockets API for IPv6" */ > > ssize_t pa_iochannel_write_with_fds(pa_iochannel*io, const void*data, size_t l, int nfd, const int *fds) { > > ssize_t r; > > int *msgdata; > > @@ -353,7 +355,7 @@ ssize_t pa_iochannel_write_with_fds(pa_iochannel*io, const void*data, size_t l, > > struct iovec iov; > > union { > > struct cmsghdr hdr; > > - uint8_t data[CMSG_SPACE(sizeof(int) * MAX_ANCIL_DATA_FDS)]; > > + uint8_t data[CMSG_SPACE(sizeof(int) * nfd)]; > > } cmsg; > > > > pa_assert(io); > > -- Darwish http://darwish.chasingpointers.com