On Tue, Nov 26, 2019 at 04:13:13PM -0800, Maciej Żenczykowski wrote: > From: Maciej Żenczykowski <maze@xxxxxxxxxx> > > and associated inet_is_local_unbindable_port() helper function: > use it to make explicitly binding to an unbindable port return > -EPERM 'Operation not permitted'. > > Autobind doesn't honour this new sysctl since: > (a) you can simply set both if that's the behaviour you desire > (b) there could be a use for preventing explicit while allowing auto > (c) it's faster in the relatively critical path of doing port selection > during connect() to only check one bitmap instead of both > > Various ports may have special use cases which are not suitable for > use by general userspace applications. Currently, ports specified in > ip_local_reserved_ports sysctl will not be returned only in case of > automatic port assignment, but nothing prevents you from explicitly > binding to them - even from an entirely unprivileged process. > > In certain cases it is desirable to prevent the host from assigning the > ports even in case of explicit binds, even from superuser processes. > > Example use cases might be: > - a port being stolen by the nic for remote serial console, remote > power management or some other sort of debugging functionality > (crash collection, gdb, direct access to some other microcontroller > on the nic or motherboard, remote management of the nic itself). > - a transparent proxy where packets are being redirected: in case > a socket matches this connection, packets from this application > would be incorrectly sent to one of the endpoints. > > Initially I wanted to solve this problem via the simple one line: > > static inline bool inet_port_requires_bind_service(struct net *net, unsigned short port) { > - return port < net->ipv4.sysctl_ip_prot_sock; > + return port < net->ipv4.sysctl_ip_prot_sock || inet_is_local_reserved_port(net, port); > } > > However, this doesn't work for two reasons: > (a) it changes userspace visible behaviour of the existing local > reserved ports sysctl, and there appears to be enough documentation > on the internet talking about setting it to make this a bad idea > (b) it doesn't prevent privileged apps from using these ports, > CAP_BIND_SERVICE is relatively likely to be available to, for example, > a recursive DNS server so it can listed on port 53, which also needs > to do src port randomization for outgoing queries due to security > reasons (and it thus does manual port binding). > > If we *know* that certain ports are simply unusable, then it's better > nothing even gets the opportunity to try to use them. This way we at > least get a quick failure, instead of some sort of timeout (or possibly > even corruption of the data stream of the non-kernel based use case). > > Test: > vm:~# cat /proc/sys/net/ipv4/ip_local_unbindable_ports > > vm:~# python -c 'import socket; s = socket.socket(socket.AF_INET6, socket.SOCK_STREAM, 0); s.bind(("::", 3967))' > vm:~# python -c 'import socket; s = socket.socket(socket.AF_INET6, socket.SOCK_DGRAM, 0); s.bind(("::", 3967))' > vm:~# echo 3967 > /proc/sys/net/ipv4/ip_local_unbindable_ports > vm:~# cat /proc/sys/net/ipv4/ip_local_unbindable_ports > 3967 > vm:~# python -c 'import socket; s = socket.socket(socket.AF_INET6, socket.SOCK_STREAM, 0); s.bind(("::", 3967))' > socket.error: (1, 'Operation not permitted') > vm:~# python -c 'import socket; s = socket.socket(socket.AF_INET6, socket.SOCK_DGRAM, 0); s.bind(("::", 3967))' > socket.error: (1, 'Operation not permitted') > > Cc: Sean Tranchetti <stranche@xxxxxxxxxxxxxx> > Cc: Subash Abhinov Kasiviswanathan <subashab@xxxxxxxxxxxxxx> > Cc: Eric Dumazet <edumazet@xxxxxxxxxx> > Cc: Linux SCTP <linux-sctp@xxxxxxxxxxxxxxx> > Signed-off-by: Maciej Żenczykowski <maze@xxxxxxxxxx> > --- > Documentation/networking/ip-sysctl.txt | 13 +++++++++++++ > include/net/ip.h | 12 ++++++++++++ > include/net/netns/ipv4.h | 1 + > net/ipv4/af_inet.c | 4 ++++ > net/ipv4/sysctl_net_ipv4.c | 18 ++++++++++++++++-- > net/ipv6/af_inet6.c | 2 ++ > net/sctp/socket.c | 5 +++++ > 7 files changed, 53 insertions(+), 2 deletions(-) > > diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt > index fd26788e8c96..7129646a18bd 100644 > --- a/Documentation/networking/ip-sysctl.txt > +++ b/Documentation/networking/ip-sysctl.txt > @@ -940,6 +940,19 @@ ip_local_reserved_ports - list of comma separated ranges > > Default: Empty > > +ip_local_unbindable_ports - list of comma separated ranges > + Specify the ports which are not directly bind()able. > + > + Usually you would use this to block the use of ports which > + are invalid due to something outside of the control of the > + kernel. For example a port stolen by the nic for serial > + console, remote power management or debugging. > + > + There's a relatively high chance you will also want to list > + these ports in 'ip_local_reserved_ports' to prevent autobinding. > + > + Default: Empty > + > ip_unprivileged_port_start - INTEGER > This is a per-namespace sysctl. It defines the first > unprivileged port in the network namespace. Privileged ports > diff --git a/include/net/ip.h b/include/net/ip.h > index 02d68e346f67..14b99bf59ffc 100644 > --- a/include/net/ip.h > +++ b/include/net/ip.h > @@ -346,6 +346,13 @@ static inline bool inet_is_local_reserved_port(struct net *net, unsigned short p > return test_bit(port, net->ipv4.sysctl_local_reserved_ports); > } > > +static inline bool inet_is_local_unbindable_port(struct net *net, unsigned short port) > +{ > + if (!net->ipv4.sysctl_local_unbindable_ports) > + return false; > + return test_bit(port, net->ipv4.sysctl_local_unbindable_ports); > +} > + > static inline bool sysctl_dev_name_is_allowed(const char *name) > { > return strcmp(name, "default") != 0 && strcmp(name, "all") != 0; > @@ -362,6 +369,11 @@ static inline bool inet_is_local_reserved_port(struct net *net, unsigned short p > return false; > } > > +static inline bool inet_is_local_unbindable_port(struct net *net, unsigned short port) > +{ > + return false; > +} > + > static inline bool inet_port_requires_bind_service(struct net *net, unsigned short port) > { > return port < PROT_SOCK; > diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h > index c0c0791b1912..6a235651925d 100644 > --- a/include/net/netns/ipv4.h > +++ b/include/net/netns/ipv4.h > @@ -197,6 +197,7 @@ struct netns_ipv4 { > > #ifdef CONFIG_SYSCTL > unsigned long *sysctl_local_reserved_ports; > + unsigned long *sysctl_local_unbindable_ports; > int sysctl_ip_prot_sock; > #endif > > diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c > index 2fe295432c24..b26046431612 100644 > --- a/net/ipv4/af_inet.c > +++ b/net/ipv4/af_inet.c > @@ -494,6 +494,10 @@ int __inet_bind(struct sock *sk, struct sockaddr *uaddr, int addr_len, > goto out; > > snum = ntohs(addr->sin_port); > + err = -EPERM; > + if (snum && inet_is_local_unbindable_port(net, snum)) > + goto out; > + > err = -EACCES; > if (snum && inet_port_requires_bind_service(net, snum) && > !ns_capable(net->user_ns, CAP_NET_BIND_SERVICE)) > diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c > index fcb2cd167f64..fd363b57a653 100644 > --- a/net/ipv4/sysctl_net_ipv4.c > +++ b/net/ipv4/sysctl_net_ipv4.c > @@ -745,6 +745,13 @@ static struct ctl_table ipv4_net_table[] = { > .mode = 0644, > .proc_handler = proc_do_large_bitmap, > }, > + { > + .procname = "ip_local_unbindable_ports", > + .data = &init_net.ipv4.sysctl_local_unbindable_ports, > + .maxlen = 65536, > + .mode = 0644, > + .proc_handler = proc_do_large_bitmap, > + }, > { > .procname = "ip_no_pmtu_disc", > .data = &init_net.ipv4.sysctl_ip_no_pmtu_disc, > @@ -1353,11 +1360,17 @@ static __net_init int ipv4_sysctl_init_net(struct net *net) > > net->ipv4.sysctl_local_reserved_ports = kzalloc(65536 / 8, GFP_KERNEL); > if (!net->ipv4.sysctl_local_reserved_ports) > - goto err_ports; > + goto err_reserved_ports; > + > + net->ipv4.sysctl_local_unbindable_ports = kzalloc(65536 / 8, GFP_KERNEL); > + if (!net->ipv4.sysctl_local_unbindable_ports) > + goto err_unbindable_ports; > > return 0; > > -err_ports: > +err_unbindable_ports: > + kfree(net->ipv4.sysctl_local_reserved_ports); > +err_reserved_ports: > unregister_net_sysctl_table(net->ipv4.ipv4_hdr); > err_reg: > if (!net_eq(net, &init_net)) > @@ -1370,6 +1383,7 @@ static __net_exit void ipv4_sysctl_exit_net(struct net *net) > { > struct ctl_table *table; > > + kfree(net->ipv4.sysctl_local_unbindable_ports); > kfree(net->ipv4.sysctl_local_reserved_ports); > table = net->ipv4.ipv4_hdr->ctl_table_arg; > unregister_net_sysctl_table(net->ipv4.ipv4_hdr); > diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c > index 60e2ff91a5b3..3c83e3200543 100644 > --- a/net/ipv6/af_inet6.c > +++ b/net/ipv6/af_inet6.c > @@ -292,6 +292,8 @@ static int __inet6_bind(struct sock *sk, struct sockaddr *uaddr, int addr_len, > return -EINVAL; > > snum = ntohs(addr->sin6_port); > + if (snum && inet_is_local_unbindable_port(net, snum)) > + return -EPERM; > if (snum && inet_port_requires_bind_service(net, snum) && > !ns_capable(net->user_ns, CAP_NET_BIND_SERVICE)) > return -EACCES; > diff --git a/net/sctp/socket.c b/net/sctp/socket.c > index 0b485952a71c..d1c93542419d 100644 > --- a/net/sctp/socket.c > +++ b/net/sctp/socket.c > @@ -384,6 +384,9 @@ static int sctp_do_bind(struct sock *sk, union sctp_addr *addr, int len) > } > } > > + if (snum && inet_is_local_unbindable_port(net, snum)) > + return -EPERM; > + > if (snum && inet_port_requires_bind_service(net, snum) && > !ns_capable(net->user_ns, CAP_NET_BIND_SERVICE)) > return -EACCES; > @@ -1061,6 +1064,8 @@ static int sctp_connect_new_asoc(struct sctp_endpoint *ep, > if (sctp_autobind(sk)) > return -EAGAIN; > } else { > + if (inet_is_local_unbindable_port(net, ep->base.bind_addr.port)) > + return -EPERM; > if (inet_port_requires_bind_service(net, ep->base.bind_addr.port) && > !ns_capable(net->user_ns, CAP_NET_BIND_SERVICE)) > return -EACCES; > -- > 2.24.0.432.g9d3f5f5b63-goog > > Just out of curiosity, why are the portreserve and portrelease utilities not a solution to this use case? Neil