Re: Parallel transfers with sftp (call for testing / advice)

[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

 



Le 10/04/2020 à 01:55, Darren Tucker a écrit :
On Thu, 9 Apr 2020 at 01:34, Cyril Servant <cyril.servant@xxxxxxxxx> wrote:
[...]
Each of our front
nodes has an outgoing bandwidth limit (let's say 1Gb/s each, generally more
limited by the CPU than by the network bandwidth),
You might also want to experiment with the Ciphers and MACs since
these can make a significant difference in CPU utilization and, if
that's the bottleneck, your throughput.  Which one is best will vary
depending on your hardware, but it's likely to be either AES GCM if
the hardware has AES instructions or chacha20-poly1305 if not.

In the first example below the bottleneck is the source's relatively
elderly 2.66GHz Intel CPU.  In the second it's the gigabit network
between them.

$ scp -c aes256-ctr -o macs=hmac-sha2-512
ubuntu-18.10-desktop-amd64.iso.bz2 nuc:/tmp/
ubuntu-18.10-desktop-amd64.iso.bz2            100% 1899MB  63.5MB/s   00:29

$ scp -c chacha20-poly1305@xxxxxxxxxxx
ubuntu-18.10-desktop-amd64.iso.bz2 nuc:/tmp/
ubuntu-18.10-desktop-amd64.iso.bz2            100% 1899MB 112.1MB/s   00:16

Hi,

As Cyril said, we are aware of the cpubound aspect of the available ciphers and MACs in OpenSSH and have already selected the most efficient one for our transfers after several benchmarking sessions.

Current hardware processors have a limited core capacity. Core frequencies are staying roughly at the same level since many years now and only core count are increasing, relying on developpers to play with parallelism in order to increase the compute throughput. The future does not seem brighter in that area.

In the meantime, network bandwidth has still increased at a regular pace. As a result, a cpu frequency that was once sufficient to fill the network pipe is now only at a fraction of what the network can really deliver. 10GE ethernet cards are common nowadays on datacenter servers and no openssh ciphers and MACs can deliver the available bandwidth for single transfers.

Introducing parallelism is thus necessary to leverage what the network hardware can offer.

The change proposed by Cyril in sftp is a very pragmatic approach to deal with parallelism at the file transfer level. It leverages the already existing sftp protocol and its capability to write/read file content at specified offsets. This enables to speed up sftp transfers significantly by parallelizing the SSH channels used for large transfers. This improvement is performed only by modifying the sftp client, which is a very small modification compared to the openssh codebase. The modification is not too complicated to review and validate (I did it) and does not change the default behavior of the cli.

It exists tools that offers parallel transfers of large files but we do really want to use OpenSSH for that purpose because it is the only application that we can really trust (by the way, thank you for making that possible). I do no think that we are the only one to think like this and I am pretty sure that such a change in the main code base of OpenSSH would really help users to use their hardware more efficiently in various situations.

Best regards,

Matthieu


_______________________________________________
openssh-unix-dev mailing list
openssh-unix-dev@xxxxxxxxxxx
https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev




[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux