On Mon, 22 Apr 2024, Lex Siegel wrote: > > Better still would be for kernel_connect() to return a more normal error > > code - not EPERM. If that cannot be achieved, then I think it would be > > best for the sunrpc code to map EPERM to something else at the place > > where kernel_connect() is called - catch it early. > > The question is whether a permission error, EPERM, should cause a retry or > return. Currently xs_tcp_setup_socket() is retrying. For the retry to clear, > the connect call will have to not return a permission error to halt the retry > attempts. > > This is a default behavior because EPERM is not an explicit case of the switch > statement. Because bpf appropriately uses EPERM to show that the kernel_connect > was not permitted, it highlights the return handling for this case is missing. > It is unlikely that retry was ever the intended result. > > Upstream, the bpf that caused this is at: > https://github.com/cilium/cilium/blob/v1.15/bpf/bpf_sock.c#L336 > > This cilium bpf code has two return statuses, EPERM and ENXIO, that fall > through to the default case of retrying. Here, cilium expects both of these > statuses to indicate the connect failed. A retry is not the intended result. > > Handling this case without a retry aligns this code with the udp behavior. This > precedence for passing EPERM back up the stack was set in 3dedbb5ca10ef. > > I will amend my patch to include an explicit case for ENXIO as well, as this is > also in cilium's bpf and will cause the same bug to occur. > I think it should be up to cilium to report an errno that the kernel understands, not up to the kernel to understand whatever errno cilium chooses to return. I don't think EPERM or ENXIO are appropriate errors for network problems. EHOSTUNREACH or ECONNREFUSED would make much more sense. NeilBrown > > On Mon, Apr 22, 2024 at 8:22 AM NeilBrown <neilb@xxxxxxx> wrote: > > > > On Sat, 20 Apr 2024, Lex Siegel wrote: > > > When using a bpf on kernel_connect(), the call can return -EPERM. > > > This causes xs_tcp_setup_socket() to loop forever, filling up the > > > syslog and causing the kernel to freeze up. > > > > > > Signed-off-by: Lex Siegel <usiegl00@xxxxxxxxx> > > > --- > > > net/sunrpc/xprtsock.c | 2 ++ > > > 1 file changed, 2 insertions(+) > > > > > > diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c > > > index bb9b747d58a1..47b254806a08 100644 > > > --- a/net/sunrpc/xprtsock.c > > > +++ b/net/sunrpc/xprtsock.c > > > @@ -2446,6 +2446,8 @@ static void xs_tcp_setup_socket(struct work_struct *work) > > > /* Happens, for instance, if the user specified a link > > > * local IPv6 address without a scope-id. > > > */ > > > + case -EPERM: > > > + /* Happens, for instance, if a bpf is preventing the connect */ > > > > This will propagate -EPERM up into other layers which might not be ready > > to handle it. > > It might be safer to map EPERM to an error we would be more likely to > > expect from the network system - such as ECONNREFUSED or ENETDOWN. > > > > Better still would be for kernel_connect() to return a more normal error > > code - not EPERM. If that cannot be achieved, then I think it would be > > best for the sunrpc code to map EPERM to something else at the place > > where kernel_connect() is called - catch it early. > > > > NeilBrown > > > > > > > case -ECONNREFUSED: > > > case -ECONNRESET: > > > case -ENETDOWN: > > > -- > > > 2.39.3 > > > > > > > > >