On Fri, Jul 29, 2022 at 4:39 PM Marek Majkowski <marek@xxxxxxxxxxxxxx> wrote: > > Among many route options we support initrwnd/RTAX_INITRWND path > attribute: > > $ ip route change local 127.0.0.0/8 dev lo initrwnd 1024 > > This sets the initial receive window size (in packets). However, it's > not very useful in practice. For smaller buffers (<128KiB) it can be > used to bring the initial receive window down, but it's hard to > imagine when this is useful. The same effect can be achieved with > TCP_WINDOW_CLAMP / RTAX_WINDOW option. > > For larger buffers (>128KiB) the initial receive window is usually > limited by rcv_ssthresh, which starts at 64KiB. The initrwnd option > can't bring the window above it, which limits its usefulness > > This patch changes that. Now, by setting RTAX_INITRWND path attribute > we bring up the initial rcv_ssthresh in line with the initrwnd > value. This allows to increase the initial advertised receive window > instantly, after first TCP RTT, above 64KiB. > > With this change, the administrator can configure a route (or skops > ebpf program) where the receive window is opened much faster than > usual. This is useful on big BDP connections - large latency, high > throughput - where it takes much time to fully open the receive > window, due to the usual rcv_ssthresh cap. > > However, this feature should be used with caution. It only makes sense > to employ it in limited circumstances: > > * When using high-bandwidth TCP transfers over big-latency links. > * When the truesize of the flow/NIC is sensible and predictable. > * When the application is ready to send a lot of data immediately > after flow is established. > * When the sender has configured larger than usual `initcwnd`. > * When optimizing for every possible RTT. > > This patch is related to previous work by Ivan Babrou: > > https://lore.kernel.org/bpf/CAA93jw5+LjKLcCaNr5wJGPrXhbjvLhts8hqpKPFx7JeWG4g0AA@xxxxxxxxxxxxxx/T/ > > Please note that due to TCP wscale semantics, the TCP sender will need > to receive first ACK to be informed of the large opened receive > window. That is: the large window is advertised only in the first ACK > from the peer. When the TCP client has large window, it is advertised > in the third-packet (ACK) of the handshake. When the TCP sever has > large window, it is advertised only in the first ACK after some data > has been received. > > Syncookie support will be provided in subsequent patchet, since it > requires more changes. > > *** BLURB HERE *** > > Marek Majkowski (2): > RTAX_INITRWND should be able to set the rcv_ssthresh above 64KiB > Tests for RTAX_INITRWND > > include/linux/tcp.h | 1 + > net/ipv4/tcp_minisocks.c | 9 +- > net/ipv4/tcp_output.c | 7 +- > .../selftests/bpf/prog_tests/tcp_initrwnd.c | 420 ++++++++++++++++++ > .../selftests/bpf/progs/test_tcp_initrwnd.c | 30 ++ > 5 files changed, 463 insertions(+), 4 deletions(-) > create mode 100644 tools/testing/selftests/bpf/prog_tests/tcp_initrwnd.c > create mode 100644 tools/testing/selftests/bpf/progs/test_tcp_initrwnd.c Changelog: - moved proposed rcv_ssthresh from `struct inet_request_soct` into `struct tcp_request_sock` as per Eric's suggestion - extended tests to be more explicit about syncookies