Hi Amol, On Mon, Feb 12, 2024 at 04:15:24PM +0530, Amol Surati wrote: > Hi Alex, > > On Wed, 7 Feb 2024 at 18:59, Alejandro Colomar <alx@xxxxxxxxxx> wrote: > > > > Hi Amol, > > > > On Wed, Feb 07, 2024 at 09:43:55AM +0530, Amol Surati wrote: > [ ... ] > > The fact that the compiler does the equivalent of memcpy(3) doesn't > > imply a removal of type safety. > > > > The copiler is aware of the union type at the assignment, and so can > > emit diagnostics if anything is wrong. > > True. That's the diagnostics that was emitted when running 'v=u'. > > > > > When assigning to a union member, only that member needs to be copied, > > and I'm not sure what happens to other members that have a larger size. > > As far as I can read in the standard, it's Undefined Behavior. Maybe > > someone can correct me. > > The std says: > "When a value is stored in a member of an object of union type, the bytes > of the object representation that do not correspond to that member but > do correspond to other members take unspecified values." Hmm, interesting. > > When assigning to a union (not its members), the standard doesn't seem > > to specify at all the semantics, so I can't really tell. Depending on > > the semantics of assigning to a member, I can tell. At first glance, I > > don't see anything we should worry. > > [ ... ] > > Not really. If well used, they do actually improve type safety. Think > > of my string unions vs the struct proposed by Martin. My union has > > const correctness, which is impossible to achieve via structs. A > > I am not sure I fully understand how using structs forbids achieving > const-correctness (vs the unions as declared), > but I am also not aware > of all the ways in which nginx uses strings. > > [ ... ] > > > Indeed. But I'm not sure I'd like a string type that I must treat as a > > FILE*. Also, how do you initialize such strings? With an EMPTY_STRING > > macro? Do you always get a heap pointer, so you can't declare a string > > in the stack as this? > > > > struct string s; > > > > What's the content of that string? Is a NULL pointer a valid state of > > the string? In NGINX we found this week several cases where we were > > doing pointer arithmetics on strings that were holding NULL, precisely > > due to this (or it was after some calloc(3), more likely, I don't > > remember). It was NULL + 0, which is the least of undefined behaviors > > I'm concerned about, but hey, there's another NULL to worry about. > > > > So, you'll need to do > > > > struct string s = EMPTY_STRING; > > > > to avoid ever having a NULL, which is yet another thing to worry about. > > > > Having suffered these strings, I wouldn't recommend them as something > > better than a char*. In NGINX, we have them due to historic reasons, > > and rewriting the entire code base would be unreasonable, and while they > > may have been a good optimization 2 decades ago, I bet that it's a > > useless micro-optimization today, and we could just use char* everywhere > > to have less problems. > > If a variable is going to be overwritten without being read first, then there's > no need to initialize it. Within my implementation, the state resulting from > zero-init, i.e. 'struct string s = {0};', is deliberately chosen to be > well-defined > for the implementation. For e.g. appending a char to a zero-init'd string is > well-defined. Good. Reading a zero-init'd string would return something like "", I guess. Probably needs some special-case in the code, but ok. > > > > You can only make them const, if you use two distinct types: a read-only > > > > version, let's call it rstring, and a read-write version, let's call it > > > > string. > > > > > > But do we need both views (r/w and r/o) at the same time? > > > > Yes: I want to be able to take a r/w string and pass it to a function > > that accepts a r/o string. Since distinct types are incompatible, and > > this is the main problem with things like sockaddr, you can't just cast; > > that's likely to invoke Undefined Behavior. > > The std says: > "A pointer to an object type may be converted to a pointer to a different > object type. If the resulting pointer is not correctly aligned for the > referenced type, the behavior is undefined. Otherwise, when converted > back again, the result shall compare equal to the original pointer." > > So typecasting between object types is well-defined, given that the Yes, casting pointers is fine. > resulting pointer is indeed correctly aligned for the target type. The But having the resulting pointer be correctly aligned is enough for allowing a pointer conversion. But the only thing that the standard allows to do with that pointer is to convert it back. It is not allowed to dereference it, even if the alignment is correct. <http://port70.net/~nsz/c/c11/n1570.html#6.3.2.3p7> > various sock_addr types all impose a common default alignment, > because they all begin with a member of the same type. > > The std. also says: > "An object shall have its stored value accessed only by an lvalue > expression that has one of the following types: > . . . > — a character type." > > For e.g., the _sys_bind system-call under Linux kernel receives the > same 'const struct sockaddr *', along with the size of the data, from > glibc. The kernel then copies the data using memcpy (or equivalent > routines). > > Does the pointer conversion forbid the kernel from memcpy'ing the > object using that converted pointer? (It is likely that the converted > pointer, if it is not of compatible type, may not be dereferenced using > that type; but here the kernel isn't dereferencing the pointer using > the 'const struct sockaddr *' type; it is reading the obj-representation > using memcpy.) > > If no, then this behaviour is well-defined. Nah, memcpy'ing is fair game. > If yes, then this behaviour is undefined, although, in practice, both > gcc and clang (the compilers that can compile Linux kernels) must be > allowing the expected behaviour here, since the compiled kernel > binaries do indeed work wherever they are deployed. This only means > that, *if* the behaviour turns out to be theoretically undefined, *then* > the std. can be augmented to specifically allow reading of > representations of objects pointed to by such suitably converted pointers, > if it does not already allow such reads. However, when the kernel writes data that the user should read, it does so via memcpy on a sockaddr_storage. How is the user supposed to read that data? Say we define a sockaddr_storage, pass it to getpeername(2), where the kernel writes to it, and then we want to read it. Which type should we use? Imagine that it's a Unix socket, so it should be as sockaddr_un. But it was never written as that type, so we can't. <https://austingroupbugs.net/view.php?id=1641> > [ ... ] > > > > A function that is passed 'struct string *' can still be allowed to > > > inplace-modify data[i] by invoking appropriate string APIs that temporarily > > > casts away the constness of data[i]. > > > > That would be a violation of const correctness. If necessary, yeah, go > > ahead. But personally, I'd prefer avoiding that. For something that is > > proposed as an inclusion to ISO C, I'd rather not have it. > > I keep two bits of info within the string: > (1) whether the 'struct string' is heap-allocated, or not. > (2) whether the data (const char *) it points to is heap-allocated, or not. > > The std. says: > "If an attempt is made to modify an object defined with a const-qualified > type through use of an lvalue with non-const-qualified type, the behavior > is undefined." > > Given that a malloc'd object doesn't have its own declaration or a > declared type to begin with, it also doesn't have a definition, let alone > a const-qualified definition. Modifying malloc'd objects should not cause > const-correctness to be violated, even when casting away const on a > pointer that points to such an object. > > Modifiable strings can be created from statically allocated data (for e.g., > pointing inside the .rodata section), and the data can be malloc'd and > changed/appended-to on a copy-on-write basis. Hmmm, while casting away const is valid, it's not safe. I'm not sure I'd call that const-correct. It's not UB, that I agree. But you're bypassing type safety when you do a cast. Have a lovely day! Alex -- <https://www.alejandro-colomar.es/> Looking for a remote C programming job at the moment.
Attachment:
signature.asc
Description: PGP signature