Re: allowing msg_name and msg_control

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Nov 7, 2020 at 4:24 PM Pavel Begunkov <asml.silence@xxxxxxxxx> wrote:
>
> On 07/11/2020 14:22, Victor Stewart wrote:
> > RE Jen's proposed patch here
> > https://lore.kernel.org/io-uring/45d7558a-d0c8-4d3f-c63a-33fd2fb073a5@xxxxxxxxx/
>
> Hmm, I haven't seen this thread, thanks for bringing it up
>
> >
> > and RE what Stefan just mentioned in the "[PATCH 5.11] io_uring: don't
> > take fs for recvmsg/sendmsg" thread a few minutes ago... "Can't we
> > better remove these checks and allow msg_control? For me it's a
> > limitation that I would like to be removed."... which I coincidentally
> > just read when coming on here to advocate the same.
> >
> > I also require this for a few vital performance use cases:
> >
> > 1) GSO (UDP_SEGMENT to sendmsg)
> > 2) GRO (UDP_GRO from recvmsg)
>
> Don't know these you listed, may read about them later, but wouldn't [1]
> be enough? I was told it's queued up.
>
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/net/socket.c?id=583bbf0624dfd8fc45f1049be1d4980be59451ff
>

Hadn't seen [1], but yes as long as the same were also implemented for
__sys_sendmsg_sock(). Queued up for.. 5.11?

UDP_SEGMENT allows you to sendmsg a UDP message payload up to ~64K
(Max IP Packet size - IPv4(6) header size - UDP header size).. in
order to obey the existing network stack expectations/limitations).
That payload is actually a sequence of DPLPMTUD sized packets (because
MTU size is restricted by / variable per path to each client). That
DPLPMTUD size is provided by the UDP_SEGMENT value, with the last
packet allowed to be a smaller size.

So you can send ~40 UDP messages but only pay the cost of network
stack traversal once. Then the segmentation occurs in the NIC (or in
the kernel with the NIC has no UDP GSO support, but most all do).

There's also a pacing patch in the works for UDP GSO sends:
https://lwn.net/Articles/822726/

Then UDP_GRO is the exact inverse, so when you recvmsg() you receive a
giant payload with the individual packet size notified via the UDP_GRO
value, then self segment.

These mimic the same optimizations available without configuration for
TCP streams.

Willem discusses all in the below paper (and there's a talk on youtube).
http://vger.kernel.org/lpc_net2018_talks/willemdebruijn-lpc2018-udpgso-paper-DRAFT-1.pdf

oh and sorry the title of this should have been sans msg_name.

> >
> > GSO and GRO are super important for QUIC servers... essentially
> > bringing a 3-4x performance improvement that brings them in line with
> > TCP efficiency.
> >
> > Would also allow the usage of...
> >
> > 3) MSG_ZEROCOPY (to receive the sock_extended_err from recvmsg)
> >
> > it's only a single digit % performance gain for large sends (but a
> > minor crutch until we get registered buffer sendmsg / recvmsg, which I
> > plan on implementing).

and i just began work on fixed versions of sendmsg / recvmsg. So i'll
distribute that patch for initial review probably this week. Should be
fairly trivial given the work exists for read/write.

> >
> > So if there's an agreed upon plan on action I can take charge of all
> > the work and get this done ASAP.
> >
> > #Victor
> >
>
> --
> Pavel Begunkov



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux