Re: [REGRESSION] vsocket timeout with kata containers agent 3.10.1 and kernel 6.6.70

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 21 Jan 2025 at 10:26, Stefano Garzarella <sgarzare@xxxxxxxxxx> wrote:
>
> Hi Simon,
>
> On Tue, 21 Jan 2025 at 05:53, Simon Kaegi <simon.kaegi@xxxxxxxxx> wrote:
> >
> > #regzbot introduced v6.6.69..v6.6.70
> > #regzbot introduced: ad91a2dacbf8c26a446658cdd55e8324dfeff1e7
> >
> > We hit this regression when updating our guest vm kernel from 6.6.69
> > to 6.6.70 -- bisecting, this problem was introduced in
> > ad91a2dacbf8c26a446658cdd55e8324dfeff1e7 -- net: restrict SO_REUSEPORT
> > to inet sockets
> >
> > We're getting a timeout when trying to connect to the vsocket in the
> > guest VM when launching a kata containers 3.10.1 agent which
> > unsurprisingly ... uses a vsocket to communicate back to the host.
> >
> > We updated this commit and added an additional sk_is_vsock check and
> > recompiled and this works correctly for us.
> > - if (valbool && !sk_is_inet(sk))
> > + if (valbool && !(sk_is_inet(sk) || sk_is_vsock(sk)))
> >
> > My understanding is limited here so I've added Stefano as he is likely
> > to better understand what makes sense here.
>
> Thanks for adding me, do you have a reproducer here?
>
> AFAIK in AF_VSOCK we never supported SO_REUSEPORT, so it seems strange to me.
>
> I understand that the patch you refer to actually changes the behavior
> of setsockopt(..., SO_REUSEPORT, ...) on an AF_VSOCK socket, where it
> used to return successfully before that change, but now returns an
> error, but subsequent binds should have still failed even without this
> patch.
>
> Do you actually use the SO_REUSEPORT feature on AF_VSOCK?
>
> If so, I need to better understand if the core socket does anything,
> but as I recall AF_VSOCK allocates ports internally, so I don't think
> multiple binds on the same port have ever been supported.

I just tried on an old kernel without the patch applied, and I confirm
that SO_REUSEPORT was not supported also if the setsockopt() was
successful.

I run the following snippet on 2 shell, on the first one everything
fine, but on the second the bind() fails in this way:

$ uname -r
6.10.11-200.fc40.x86_64
$ python3
>>> import socket
>>> import os
>>> s = socket.socket(socket.AF_VSOCK, socket.SOCK_STREAM)
>>> s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEPORT, 1)
>>> s.bind((socket.VMADDR_CID_ANY, 4242))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OSError: [Errno 98] Address already in use


With the patch applied, the setsockopt() fails immediately, but the
bind() behavior is the same (fails only on the second):

$ uname -r
6.12.9-200.fc41.x86_64
$ python3
>>> import socket
>>> import os
>>> s = socket.socket(socket.AF_VSOCK, socket.SOCK_STREAM)
>>> s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEPORT, 1)
Traceback (most recent call last):
  File "<python-input-3>", line 1, in <module>
    s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEPORT, 1)
    ~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: [Errno 95] Operation not supported

So, IMHO the patch is correct since AF_VSOCK never really supported
SO_REUSEPORT, so better to fail early.

BTW I'm not sure what is happening on your side.
Could it be a problem in your code that uses SO_REUSEPORT
indiscriminately on AF_VSOCK, even though you then never bind on the
same port again?

Thanks,
Stefano





[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux