Hi Stefan, On 1/20/23 11:06, Stefan Puiu wrote:
Hi Alex, On Thu, Jan 19, 2023 at 4:14 PM Alejandro Colomar <alx.manpages@xxxxxxxxx> wrote:Hi! I just received a report about struct sockaddr_storage in the man pages. It reminded me of some concern I've always had about it: it doesn't seem to be a usable type. It has some alignment promises that make it "just work" most of the time, but it's still a UB mine, according to ISO C. According to strict aliasing rules, if you declare a variable of type 'struct sockaddr_storage', that's what you get, and trying to access it later as some other sockaddr_8 is simply not legal. The compiler may assume those accesses can't happen, and optimize as it pleases.Can you detail the "is not legal" part?
I mean that it's Undefined Behavior contraband.
How about the APIs like connect() etc that use pointers to struct sockaddr, where the underlying type is different, why would that be legal while using sockaddr_storage isn't?
That's also bad. However, it can be fixed by fixing `sockaddr_storage` and telling everyone to use it instead of using whatever other `sockaddr_*`. You need a union for the underlying storage, so that the library functions can access both as `sockaddr` and as `sockaddr_*`.
The problem isn't really in the implementation of connect(2), but on the type. The implementation of connect(2) would be fine if we just fixed the type. See some example:
struct my_sockaddr_storage { union { sa_family_t ss_family; struct sockaddr sa; struct sockaddr_in sin; struct sockaddr_in6 sin6; struct sockaddr_un sun; }; }; void foo(foo) { struct my_sockaddr_storage mss; struct sockaddr_storage ss; // initialize mss and ss inet_sockaddr2str(&mss.sa); // correct inet_sockaddr2str((struct sockaddr_storage *)&ss); // UB } /* This function is correct, as far as the accessed object has the * type we're using. That's only possible through a `union`, since * we're accessing it with 2 different types: `sockaddr` for the * `sa_family` and then the appropriate subtype for the address * itself. */ const char * inet_sockaddr2str(const struct sockaddr *sa) { struct sockaddr_in *sin; struct sockaddr_in6 *sin6; static char buf[INET_ADDRSTRLENMAX]; switch (sa->sa_family) { case AF_INET: sin = (struct sockaddr_in *) sa; inet_ntop(AF_INET, &sin->sin_addr, buf, NITEMS(buf)); return buf; case AF_INET6: sin6 = (struct sockaddr_in6 *) sa; inet_ntop(AF_INET6, &sin6->sin6_addr, buf, NITEMS(buf)); return buf; default: errno = EAFNOSUPPORT; return NULL; } }BTW, you need a union _even if_ you only care about a single address family. That is, if you only care about Unix sockets, you can't declare your variable of type sockaddr_un, because the libc functions and syscalls still need to access it as a sockaddr to see which family it has.
Will code break in practice?
Well, it depends on how much compilers advance. Here's some interesting experiment: <https://software.codidact.com/posts/287748/287750#answer-287750>I wouldn't rely on Undefined Behavior not causing nasal demons. When you get them, you can only kill them with garlic.
That means that one needs to declare a union with all possible sockaddr_* types that are of interest, so that access as any of them is later allowed by the compiler (of course, the user still needs to access the correct one, but that's of course). In that union, one could add a member that is of type sockaddr_storage for getting a more consistent structure size (for example, if some members are conditional on preprocessor stuff), but I don't see much value in that. Especially, given this comment that Igor Sysoev wrote in NGINX Unit's source code: * struct sockaddr_storage is: * 128 bytes on Linux, FreeBSD, MacOSX, NetBSD; * 256 bytes on Solaris, OpenBSD, and HP-UX; * 1288 bytes on AIX. * * struct sockaddr_storage is too large on some platforms * or less than real maximum struct sockaddr_un length. Which makes it even more useless as a type.I'm not sure using struct sockaddr_storage for storing sockaddr_un's (UNIX domain socket addresses, right?) is that common a usage. I've used it in the past to store either a sockaddr_in or a sockaddr_in6, and I think that would be a more common scenario. The comment above probably makes sense for nginx, but different projects have different needs. As for the size, I guess it might matter if you want to port your code to AIX, Solaris, OpenBSD etc. I don't think all software is meant to be portable, though (or portable to those platforms). Maybe a warning is in order that, for portable code, developers should check its size on the other platforms targeted.
The size thing is just an added problem. The deep problem is that you need to use a union that contains all types that you care about _plus_ plain sockaddr, because the structure will be accessed at least as a sockaddr, plus one of the different specialized structures. So even for only sockaddr_un, you need at least the following:
union my_unix_sockaddr { struct sockaddr sa; struct sockaddr_un sun; }; Not doing that will necessarily result in invoking Undefined Behavior at some point.
Just my 2 cents, as always, Stefan.
The good thing is that fixing sockaddr_storage and telling everybody to use it always fixes the problem, so I'm preparing a patch for glibc.
Cheers, Alex
Should we warn about uses of this type? Should we recommend against using it in the manual page, since there's no legitimate uses of it? Cheers, Alex -- <http://www.alejandro-colomar.es/>
-- <http://www.alejandro-colomar.es/>
Attachment:
OpenPGP_signature
Description: OpenPGP digital signature