Re: Why does backend send buffer size hardcoded at 8KB?

Andres Freund <andres@xxxxxxxxxxx> · Sat, 27 Jul 2019 14:08:50 -0700

Hi,

On 2019-07-27 11:09:06 -0400, Tom Lane wrote:
> Artemiy Ryabinkov <getlag@xxxxx> writes:
> > Does it make sense to make this parameter configurable?
>
> Not without some proof that it makes a performance difference on
> common setups (which you've not provided).

I think us unnecessarily fragmenting into some smaller packets everytime
we send a full 8kB buffer, unless there's already network congestion, is
kind of evidence enough? The combination of a relatively small send
buffer + TCP_NODELAY isn't great.

I'm not quite sure what the smaller buffer is supposed to achieve, at
least these days. In blocking mode (emulated in PG code, using latches,
so we can accept interrupts) we'll always just loop back to another
send() in internal_flush(). In non-blocking mode, we'll fall out of the
loop as soon as the kernel didn't send any data. Isn't the outcome of
using such a small send buffer that we end up performing a) more
syscalls, which has gotten a lot worse in last two years due to all the
cpu vulnerability mitigations making syscalls a *lot* more epensive b)
unnecessary fragmentation?

The situation for receiving data is a bit different. For one, we don't
cause unnecessary fragmentation by using a buffer of a relatively
limited size. But more importantly, copying data into the buffer takes
time, and we could actually be responding to queries earlier in the
data. In contrast to the send case we don't loop around recv() until all
the data has been received.

I suspect we could still do with a bigger buffer, just to reduce the
number of syscalls in bulk loading cases, however.

Greetings,

Andres Freund