Re: Difference in buffer behaviour between 8.89 and 8.9?

[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

 



Of course, 90 minutes later I find that my issues starts with commit b30d32159d

"upstream: add a ssh_packet_process_read() function that reads from

    a fd directly into the transport input buffer.

    Use this in the client and server mainloops to avoid unnecessary
    copying. It also lets us use a more greedy read size without penalty.

    Yields a 2-3% performance gain on cipher-speed.sh (in a fairly
    unscientific test tbf)"

I still don't know why it's causing this behaviour so if anyone has a clue please share?

Thanks
Chris



On 5/18/22 2:02 PM, rapier wrote:
Hey all,

I've run in to a strange issue and I'm wondering if anyone has any insight on this. So I have a modified version of OpenSSH called hpnssh that uses larger receive buffer sizes to improve throughput. When I ported that patch set to 8.9 I found out after an initial burst of data the throughput would drop to zero after an initial burst and just sort of stay there. This happens regardless of the RTT; same issues at sub-millisecond RTTs and 150ms.

I've been able to replicate this behaviour in OpenSSH by increasing the size of CHAN_SES_DEFAULT_WINDOW from 64*CHAN_SES_PACKET_DEFAULT to higher values. You kind of see it in the debug log at 96*CSPD and you can start to see it impact throughput at 128*CSDP. With my tests I have it at 512*CSPD and the pause will last upwards of 5 seconds. When it's at 1024*CSPD (corresponding to a 32MB receive buffer) the pause will last 18 seconds or more.

I dropped in a debug statement in to sshbuf.c in sshbuf_allocate to get a better view of whats happening. The following are the results of that in both 8.9p1 and 8.8p1


Version 8.9p1

Start of buffer growth

debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
debug1: adjusted rlen: 98304, len: 32768 for 0x55d548211550
debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
debug1: adjusted rlen: 131072, len: 32768 for 0x55d548211550
debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
debug1: adjusted rlen: 163840, len: 32768 for 0x55d548211550
debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
debug1: adjusted rlen: 196608, len: 32768 for 0x55d548211550
debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
debug1: adjusted rlen: 229376, len: 32768 for 0x55d548211550
…
debug1: adjusted rlen: 22003712, len: 32768 for 0x55d548211550
debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
debug1: adjusted rlen: 22036480, len: 32768 for 0x55d548211550
debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
debug1: adjusted rlen: 22069248, len: 32768 for 0x55d548211550
debug1: adjusted rlen: 16640, len: 16404 for 0x55d54820bd60
debug1: adjusted rlen: 22085632, len: 16384 for 0x55d548211550
debug1: adjusted rlen: 16640, len: 16404 for 0x55d54820bd60
debug1: adjusted rlen: 22102016, len: 16384 for 0x55d548211550
debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
debug1: adjusted rlen: 22134784, len: 32768 for 0x55d548211550
debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
debug1: adjusted rlen: 22167552, len: 32768 for 0x55d548211550
debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
debug1: adjusted rlen: 22200320, len: 32768 for 0x55d548211550
debug1: adjusted rlen: 16640, len: 16404 for 0x55d54820bd60
debug1: adjusted rlen: 22216704, len: 16384 for 0x55d548211550
debug1: adjusted rlen: 16640, len: 16404 for 0x55d54820bd60
debug1: adjusted rlen: 22233088, len: 16384 for 0x55d548211550
debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
debug1: adjusted rlen: 22265856, len: 32768 for 0x55d548211550
debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
debug1: adjusted rlen: 22298624, len: 32768 for 0x55d548211550
debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
debug1: adjusted rlen: 22331392, len: 32768 for 0x55d548211550
debug1: adjusted rlen: 16640, len: 16404 for 0x55d54820bd60
debug1: adjusted rlen: 22347776, len: 16384 for 0x55d548211550
debug1: adjusted rlen: 16640, len: 16404 for 0x55d54820bd60
debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
debug1: adjusted rlen: 33024, len: 32788 for 0x55d54820bd60
— end of buffer growth

So there is a buffer that kicks in after an short initial period of good throughput that has then has to grow larger that the size of CHAN_SES_WINDOW_DEFAULT (128*CSPD in this case) before throughput can start to pick up again. During this growth period CPU usage is pegged at 100% even though throughput is minimal.


Version 8.8p1

Start buffer adjustment

debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 49152, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 81920, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 114688, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 147456, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 180224, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 212992, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 245760, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 278528, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 311296, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 344064, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 376832, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 409600, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 442368, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 475136, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 507904, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 540672, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 573440, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 606208, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 638976, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 671744, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 704512, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 737280, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 770048, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 802816, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 835584, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 868352, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 901120, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 933888, len: 32768 for 0x5570cf28a3e0
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410
debug1: adjusted rlen: 33024, len: 32788 for 0x5570cf28d410

End buffer adjustment

In version 8.8 I also see a buffer growing during the course of the transfer but it looks like it doesn’t need to grow to the same size as in 8.9p1 (in this case WINDOW_DEFAULT was set to 4096*CSPD). This is on the same test bed, same set up, and the tests are being run right after each other.

The behaviour in 8.9 is very consistent. After a certain number of adjusted rlens for, what I am assuming is the packet buffer, it goes into this growth phase for another buffer. In my test that been about 700 of these adjusts which is unrelated to the size of CHAN_SES_WINDOW_DEFAULT

So I’m wondering what changed between 8.8 and 8.9 that might account for this and if this is expected/desired behaviour. I'm still working through the diff from 8.8 to 8.9 but nothing has leaped out at me yet.

Thanks,
Chris
_______________________________________________
openssh-unix-dev mailing list
openssh-unix-dev@xxxxxxxxxxx
https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev




[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux