Hi Stefan, On 1/23/23 08:40, Stefan Puiu wrote:
According to strict aliasing rules, if you declare a variable of type 'struct sockaddr_storage', that's what you get, and trying to access it later as some other sockaddr_8 is simply not legal. The compiler may assume those accesses can't happen, and optimize as it pleases.Can you detail the "is not legal" part?I mean that it's Undefined Behavior contraband.OK, next question. Is this theoretical or practical UB?
Since the functions using this type are not functions that should be inlined, since the code is rather large, they are not visible to the compiler, so many of the optimizations that this UB enables are not likely to happen. Translation Unit (TU) boundaries are what keeps most UB invokations not be really dangerous.
Also, glibc seems to be using a GCC attribute (transparent_union) to make the code avoid UB even if it were inlined, so if you use glibc, you're fine. If you're using some smaller libc with a less capable compiler, or maybe C++, you are less lucky, but TU boundaries will probably still save your day.
People check documentation about how to write code today, I think.
I'm fine documenting how to do it today. But before changing the documentation, I'd like to take some time to reflect on what can we do to fix the standard so that we don't have this semi-broken state forever. When we have a clear idea of what we can do to fix the implementation and hopefully the standard long-term, possibly keeping source code the same, we can do a better recommendation for programmers.
Today, you can do 2 things:- You don't care about UB, and would like that C had always been K&R C, and GCC just makes it work. Then use `sockaddr_storage`. It will just work. When it stops working, you can blame the compiler and libc for optimizing way too much.
- You care a lot about UB. Then write your own union, as all the `sockaddr` interface should have been designed from the ground up. That's what unions are for.
Which should we recommend? That's my problem.I don't want to be documenting the latter, because it's non-standard, and it's still likely to do it invoking UB in a different way, because it's a difficult part of the language, and when you roll your own, you're likely to make accidents.
So, ideally, I'd like to document the former, but for that, I'd like to make sure that it will work forever, since otherwise we'd be blamed when somebody's code is compiled in a platform with some combination of libc, compiler, and phase of the moon, that makes the UB become non-theoretical.
I think we can fix the definition of `sockaddr_storage` to have defined behavior, with the changes I'm discussing with Bastien, so I guess we'll document the former.
Will code break in practice?Well, it depends on how much compilers advance. Here's some interesting experiment: <https://software.codidact.com/posts/287748/287750#answer-287750>That code plays with 2 pointers to the same area, one to double and one to int, so I don't think it's that similar to the sockaddr situation. At least for struct sockaddr, the sa_family field is the same for all struct sockaddr_* variants. Also, in practical terms, I don't think any compiler optimization that breaks socket APIs (and, if I recall correctly, there are instances of this pattern in the kernel as well) is going to be an easy sell. It's possible, but realistically speaking, I don't think it's going to happen.
The common initial sequence of structures is only allowed if the structures form part of a union (which is why to avoid UB you need a union; and still, you need to make sure you don't invoke UB in a different way).
<https://port70.net/%7Ensz/c/c11/n1570.html#6.5.2.3p6>
I wouldn't rely on Undefined Behavior not causing nasal demons. When you get them, you can only kill them with garlic.OK, but not all theoretical issues have practical implications. Is there code that can show UB in practical terms with struct sockaddr_storage today? Like Eric mentioned in another thread, does UBSan complain about code using struct sockaddr_storage?
It's unlikely. But I can't promise it will be safe under some random combination of compiler and library, and depends also on what you do in your code, which will affect compiler optimizations.
Cheers, Alex -- <http://www.alejandro-colomar.es/>
Attachment:
OpenPGP_signature
Description: OpenPGP digital signature