Re: [PATCH v2] socket: Implement sockaddr_storage with an anonymous union

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Zack,

On 1/21/23 03:38, Alejandro Colomar wrote:
Hi Zack,

On 1/20/23 20:25, Alejandro Colomar wrote:
[CC += GCC]  // pun not intended :P

Hi Zack,

On 1/20/23 19:04, Zack Weinberg wrote:
On Fri, Jan 20, 2023, at 8:40 AM, Alejandro Colomar wrote:
The historical design of `sockaddr_storage` makes it impossible to use
without breaking strict aliasing rules.  Not only this type is unusable,
but even the use of other `sockaddr_*` structures directly (when the
programmer only cares about a single address family) is also incorrect,
since at some point the structure will be accessed as a `sockaddr`, and
that breaks strict aliasing rules too.

So, the only way for a programmer to not invoke Undefined Behavior is to
declare a union that includes `sockaddr` and any `sockaddr_*` structures
that are of interest, which allows later accessing as either the correct
structure or plain `sockaddr` for the sa_family.

...

     struct new_sockaddr_storage  nss;

     // ... (initialize oss and nss)

     inet_sockaddr2str(&nss.sa);  // correct (and has no casts)

I think we need to move slowly here and be _sure_ that any proposed change
does in fact reduce the amount of UB.

Sure, I'm just sending the patch to polish the idea around some concrete code. While I checked that it compiles, I didn't add any tests about it or anything, to see that it's usable (and Joseph's email showed me that it's far from being finished).  I expect that this'll take some time.


 This construct, in particular, might
not actually be correct in practice: see https://godbolt.org/z/rn51cracn for
a case where, if I'm reading it right, the compiler assumes that a write
through a `struct fancy *` cannot alter the values accessible through a
`struct simple *` even though both pointers point into the same union.
(Test case provided by <https://stackoverflow.com/users/363751/supercat>;


[...]

I was wrong in my guess; the correct output is 3/3; I think I had read it the other way around.  So yes, I believe it's doing what you just wrote there, but can't understand why.

I reduced @supercat's example to a smaller reproducer program (I couldn't minimize it any more than this; any further simplification removes the incorrect behavior):

#include <stdio.h>

struct a { int y[1];};
struct b { int y[1];};
union u  { struct a a; struct b b; };


int read_a(struct a *a)
{
     return a->y[0];
}

void write_b(struct b *b, int j)
{
     b->y[j] = 2;
}

int use_union(union u *u, int j)
{
     if (u->a.y[0] == 0)
         write_b(&u->b, j);
         //write_b((struct b *)u, j);   // this has the same issue
     return read_a(&u->a);
     return read_a((struct a *)u);      // this has the same issue
}

int (*volatile vtest)(union u *u, int j) = use_union;

int main(void)
{
     int       r1, r2;
     union u   u;
     struct b  b = {0};

     u.b = b;
     r1 = vtest(&u, 0);
     r2 = u.a.y[0];

     printf("%d/%d\n", r1, r2);
}


Ahh, indeed it seems to be UB. It's in the same 6.5.2.3/6: there's a requirement that the information about the union is kept in the function in which it's accessed.

The standard presents an example, which is a bit ambiguous:

The following is not a valid fragment (because the union type is not visible within function f):

          struct t1 { int m; };
          struct t2 { int m; };
          int f(struct t1 *p1, struct t2 *p2)
          {
                if (p1->m < 0)
                        p2->m = -p2->m;
                return p1->m;
          }
          int g()
          {
                union {
                        struct t1 s1;
                        struct t2 s2;
                } u;
                /* ... */
                return f(&u.s1, &u.s2);
          }

I don't know what's the intention if the union type was global but the variable `u` was still not seen by f(). But it seems GCC's interpretation is UB, according to the test we just saw.

The solution that I can see for that is making sockaddr also be a union. That way, the union is kept across all calls (since they all use sockaddr).

struct sockaddr {
	union {
		struct {
			sa_family_t  sa_family;
			char         sa_data[14];  // why 14?
		}
		struct sockaddr_in   sin;
		struct sockaddr_in6  sin6;
		struct sockaddr_un   sun;
	};
};


struct sockaddr_storage {
	union {
		sa_family_t          ss_family;
		struct sockaddr      sa;
	};
};


With this, sockaddr_storage becomes useless, but still usable. New code, could just use sockaddr and use the internal union members as necessary. Union info is kept across all function boundaries.

Cheers,

Alex

--
<http://www.alejandro-colomar.es/>

Attachment: OpenPGP_signature
Description: OpenPGP digital signature


[Index of Archives]     [Kernel Documentation]     [Netdev]     [Linux Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux