Re: [REGRESSION] vsocket timeout with kata containers agent 3.10.1 and kernel 6.6.70

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks Stefano,

The feedback about vsock expectations was exactly what I was hoping
you could provide.

In the Kata agent we're not directly setting SO_REUSEPORT as a socket
option so I think what you suggest where SO_REUSEORT is being set
indiscriminately is happening a layer down perhaps in the tokio or nix
crates we use. I unfortunately do not have an easy way to reproduce
the problem without setting up kata containers and what's more you
need to then rebuild a recent kata flavoured minimal kernel to see the
issue.

I spent the day updating our build to use the latest kata container
release and dependencies to see if that would correct the issue.
Unfortunately that did not and so will work tomorrow to get stack
traces etc. to more directly figure things out. For the others on the
thread ... based on what Stefano said although throwing an error for
vsocks is a change in behaviour I suspect this is a problem we can fix
in a crate corrected to be more aware of vsock capabilities. I'll know
better what's possible and update tomorrow.

Thanks
-Simon

On Tue, Jan 21, 2025 at 4:54 AM Stefano Garzarella <sgarzare@xxxxxxxxxx> wrote:
>
> On Tue, 21 Jan 2025 at 10:26, Stefano Garzarella <sgarzare@xxxxxxxxxx> wrote:
> >
> > Hi Simon,
> >
> > On Tue, 21 Jan 2025 at 05:53, Simon Kaegi <simon.kaegi@xxxxxxxxx> wrote:
> > >
> > > #regzbot introduced v6.6.69..v6.6.70
> > > #regzbot introduced: ad91a2dacbf8c26a446658cdd55e8324dfeff1e7
> > >
> > > We hit this regression when updating our guest vm kernel from 6.6.69
> > > to 6.6.70 -- bisecting, this problem was introduced in
> > > ad91a2dacbf8c26a446658cdd55e8324dfeff1e7 -- net: restrict SO_REUSEPORT
> > > to inet sockets
> > >
> > > We're getting a timeout when trying to connect to the vsocket in the
> > > guest VM when launching a kata containers 3.10.1 agent which
> > > unsurprisingly ... uses a vsocket to communicate back to the host.
> > >
> > > We updated this commit and added an additional sk_is_vsock check and
> > > recompiled and this works correctly for us.
> > > - if (valbool && !sk_is_inet(sk))
> > > + if (valbool && !(sk_is_inet(sk) || sk_is_vsock(sk)))
> > >
> > > My understanding is limited here so I've added Stefano as he is likely
> > > to better understand what makes sense here.
> >
> > Thanks for adding me, do you have a reproducer here?
> >
> > AFAIK in AF_VSOCK we never supported SO_REUSEPORT, so it seems strange to me.
> >
> > I understand that the patch you refer to actually changes the behavior
> > of setsockopt(..., SO_REUSEPORT, ...) on an AF_VSOCK socket, where it
> > used to return successfully before that change, but now returns an
> > error, but subsequent binds should have still failed even without this
> > patch.
> >
> > Do you actually use the SO_REUSEPORT feature on AF_VSOCK?
> >
> > If so, I need to better understand if the core socket does anything,
> > but as I recall AF_VSOCK allocates ports internally, so I don't think
> > multiple binds on the same port have ever been supported.
>
> I just tried on an old kernel without the patch applied, and I confirm
> that SO_REUSEPORT was not supported also if the setsockopt() was
> successful.
>
> I run the following snippet on 2 shell, on the first one everything
> fine, but on the second the bind() fails in this way:
>
> $ uname -r
> 6.10.11-200.fc40.x86_64
> $ python3
> >>> import socket
> >>> import os
> >>> s = socket.socket(socket.AF_VSOCK, socket.SOCK_STREAM)
> >>> s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEPORT, 1)
> >>> s.bind((socket.VMADDR_CID_ANY, 4242))
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> OSError: [Errno 98] Address already in use
>
>
> With the patch applied, the setsockopt() fails immediately, but the
> bind() behavior is the same (fails only on the second):
>
> $ uname -r
> 6.12.9-200.fc41.x86_64
> $ python3
> >>> import socket
> >>> import os
> >>> s = socket.socket(socket.AF_VSOCK, socket.SOCK_STREAM)
> >>> s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEPORT, 1)
> Traceback (most recent call last):
>   File "<python-input-3>", line 1, in <module>
>     s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEPORT, 1)
>     ~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> OSError: [Errno 95] Operation not supported
>
> So, IMHO the patch is correct since AF_VSOCK never really supported
> SO_REUSEPORT, so better to fail early.
>
> BTW I'm not sure what is happening on your side.
> Could it be a problem in your code that uses SO_REUSEPORT
> indiscriminately on AF_VSOCK, even though you then never bind on the
> same port again?
>
> Thanks,
> Stefano
>





[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux