Hey all,
I've run in to a strange issue and I'm wondering if anyone has any
insight on this. So I have a modified version of OpenSSH called hpnssh
that uses larger receive buffer sizes to improve throughput. When I
ported that patch set to 8.9 I found out after an initial burst of data
the throughput would drop to zero after an initial burst and just sort
of stay there. This happens regardless of the RTT; same issues at
sub-millisecond RTTs and 150ms.
I've been able to replicate this behaviour in OpenSSH by increasing the
size of CHAN_SES_DEFAULT_WINDOW from 64*CHAN_SES_PACKET_DEFAULT to
higher values. You kind of see it in the debug log at 96*CSPD and you
can start to see it impact throughput at 128*CSDP. With my tests I have
it at 512*CSPD and the pause will last upwards of 5 seconds. When it's
at 1024*CSPD (corresponding to a 32MB receive buffer) the pause will
last 18 seconds or more.
I dropped in a debug statement in to sshbuf.c in sshbuf_allocate to get
a better view of whats happening. The following are the results of that
in both 8.9p1 and 8.8p1
Version 8.9p1
Start of buffer growth
debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
debug1: adjusted rlen: 98304, len: 32768 for 0x55d548211550
debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
debug1: adjusted rlen: 131072, len: 32768 for 0x55d548211550
debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
debug1: adjusted rlen: 163840, len: 32768 for 0x55d548211550
debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
debug1: adjusted rlen: 196608, len: 32768 for 0x55d548211550
debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
debug1: adjusted rlen: 229376, len: 32768 for 0x55d548211550
…
debug1: adjusted rlen: 22003712, len: 32768 for 0x55d548211550
debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
debug1: adjusted rlen: 22036480, len: 32768 for 0x55d548211550
debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
debug1: adjusted rlen: 22069248, len: 32768 for 0x55d548211550
debug1: adjusted rlen: 16640, len: 16404 for 0x55d54820bd60
debug1: adjusted rlen: 22085632, len: 16384 for 0x55d548211550
debug1: adjusted rlen: 16640, len: 16404 for 0x55d54820bd60
debug1: adjusted rlen: 22102016, len: 16384 for 0x55d548211550
debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
debug1: adjusted rlen: 22134784, len: 32768 for 0x55d548211550
debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
debug1: adjusted rlen: 22167552, len: 32768 for 0x55d548211550
debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
debug1: adjusted rlen: 22200320, len: 32768 for 0x55d548211550
debug1: adjusted rlen: 16640, len: 16404 for 0x55d54820bd60
debug1: adjusted rlen: 22216704, len: 16384 for 0x55d548211550
debug1: adjusted rlen: 16640, len: 16404 for 0x55d54820bd60
debug1: adjusted rlen: 22233088, len: 16384 for 0x55d548211550
debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
debug1: adjusted rlen: 22265856, len: 32768 for 0x55d548211550
debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
debug1: adjusted rlen: 22298624, len: 32768 for 0x55d548211550
debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
debug1: adjusted rlen: 22331392, len: 32768 for 0x55d548211550
debug1: adjusted rlen: 16640, len: 16404 for 0x55d54820bd60
debug1: adjusted rlen: 22347776, len: 16384 for 0x55d548211550
debug1: adjusted rlen: 16640, len: 16404 for 0x55d54820bd60
debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
— end of buffer growth
So there is a buffer that kicks in after an short initial period of good
throughput that has then has to grow larger that the size of
CHAN_SES_WINDOW_DEFAULT (128*CSPD in this case) before throughput can
start to pick up again. During this growth period CPU usage is pegged at
100% even though throughput is minimal.
Version 8.8p1
Start buffer adjustment
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 49152, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 81920, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 114688, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 147456, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 180224, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 212992, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 245760, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 278528, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 311296, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 344064, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 376832, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 409600, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 442368, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 475136, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 507904, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 540672, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 573440, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 606208, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 638976, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 671744, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 704512, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 737280, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 770048, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 802816, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 835584, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 868352, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 901120, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 933888, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
End buffer adjustment
In version 8.8 I also see a buffer growing during the course of the
transfer but it looks like it doesn’t need to grow to the same size as
in 8.9p1 (in this case WINDOW_DEFAULT was set to 4096*CSPD). This is on
the same test bed, same set up, and the tests are being run right after
each other.
The behaviour in 8.9 is very consistent. After a certain number of
adjusted rlens for, what I am assuming is the packet buffer, it goes
into this growth phase for another buffer. In my test that been about
700 of these adjusts which is unrelated to the size of
CHAN_SES_WINDOW_DEFAULT
So I’m wondering what changed between 8.8 and 8.9 that might account for
this and if this is expected/desired behaviour. I'm still working
through the diff from 8.8 to 8.9 but nothing has leaped out at me yet.
Thanks,
Chris