David Miller <davem@xxxxxxxxxxxxx> wrote: > From: Mat Martineau <mathew.j.martineau@xxxxxxxxxxxxxxx> > Date: Thu, 9 Jan 2020 07:59:15 -0800 > > > Match the 16-bit width of skbuff->protocol. Fills an 8-bit hole so > > sizeof(struct sock) does not change. > > > > Also take care of BPF field access for sk_type/sk_protocol. Both of them > > are now outside the bitfield, so we can use load instructions without > > further shifting/masking. > > > > v5 -> v6: > > - update eBPF accessors, too (Intel's kbuild test robot) > > v2 -> v3: > > - keep 'sk_type' 2 bytes aligned (Eric) > > v1 -> v2: > > - preserve sk_pacing_shift as bit field (Eric) > > > > Cc: Alexei Starovoitov <ast@xxxxxxxxxx> > > Cc: Daniel Borkmann <daniel@xxxxxxxxxxxxx> > > Cc: bpf@xxxxxxxxxxxxxxx > > Co-developed-by: Paolo Abeni <pabeni@xxxxxxxxxx> > > Signed-off-by: Paolo Abeni <pabeni@xxxxxxxxxx> > > Co-developed-by: Matthieu Baerts <matthieu.baerts@xxxxxxxxxxxx> > > Signed-off-by: Matthieu Baerts <matthieu.baerts@xxxxxxxxxxxx> > > Signed-off-by: Mat Martineau <mathew.j.martineau@xxxxxxxxxxxxxxx> > > This is worrisome for me. > > We have lots of places that now are going to be assigning sk->sk_protocol > into a u8 somewhere else. A lot of them are ok because limits are enforced > in various places, but for example: > > net/ipv6/udp.c: fl6.flowi6_proto = sk->sk_protocol; > net/l2tp/l2tp_ip6.c: fl6.flowi6_proto = sk->sk_protocol; > > net/ipv6/inet6_connection_sock.c: fl6->flowi6_proto = sk->sk_protocol; > > net/ipv6/af_inet6.c: fl6.flowi6_proto = sk->sk_protocol; > net/ipv6/datagram.c: fl6->flowi6_proto = sk->sk_protocol; > > This is one just one small example situation, where flowi6_proto is a u8. There are parts in the stack (e.g. in setsockopt code paths) that test sk->sk_protocol vs. IPPROTO_TCP, then call tcp specific code under the sane assumption that sk is a tcp_sock struct. With 8bit sk_protocol, mptcp_sock structs (which is what kernel gets via file descriptor number) would be treated as a tcp socket, because "IPPROTO_MPTCP & 0xff" yields IPPROTO_TCP. Changing IPPROTO_MPTCP to a value <= 255 could lead to conflicts with real inet protocols in the future, so we can't redefine it to a 8bit value. If we keep sk_protocol as 8bit field, we will need to make sure that all places testing sk_protocol == IPPROTO_TCP gain an additional sanity check to tell tcp and mptcp sockets apart. Moreover, any further changes to kernel code would need same extra test, so this is a non-starter to me. Alternatively we could change the first member of mptcp_sk struct from inet_connection_sock to a full tcp_sock struct. Thats roughly 1k increase of mptcp_sock struct to ~ 3744 bytes, but then we would not have to worry about mptcp sockets ending up in tcp code paths. If you think such a size increase is ok I could give that solution a shot and see what other problems with 8bit sk_protocol might remain. Mat reported /sys/kernel/debug/tracing/trace lists mptcp sockets as IPPROTO_TCP in the '8 bit sk_protocol' case, but if thats the only issue this might have a smaller/acceptable "avoidance fix".