Re: Parallel transfers with sftp (call for testing / advice)

[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

 



Le 06/05/2020 à 03:16, Nico Kadel-Garcia a écrit :
On Tue, May 5, 2020 at 4:31 AM Peter Stuge <peter@xxxxxxxx> wrote:
Matthieu Hautreux wrote:
The change proposed by Cyril in sftp is a very pragmatic approach to
deal with parallelism at the file transfer level. It leverages the
already existing sftp protocol and its capability to write/read file
content at specified offsets. This enables to speed up sftp transfers
significantly by parallelizing the SSH channels used for large
transfers. This improvement is performed only by modifying the sftp
client, which is a very small modification compared to the openssh
codebase. The modification is not too complicated to review and validate
(I did it) and does not change the default behavior of the cli.
I think you make a compelling argument. I admit that I haven't
reviewed the patch, even though that is what matters the most.

I guess that noone really minds ways to make SFTP scale, but ever since
the patch was proposed I have been thinking that the paralell channel
approach is likely to introduce a whole load of not very clean error
conditions regarding reassembly, which need to be handled sensibly both
within the sftp client and on the interface to outside/calling processes.
Can you or Cyril say something about this?
I find it an unnecessary feature given the possibilities of
out-of-band parallelism with multiple scp sessions transmitting
diferent manifests of files, of sftp to do the same thing, and of
tools like rsync to do it more efficiently by avoiding replication of
previously transmitted data and re-connection to complete partial
transmisions. It sounds like a bad case of "here, let me do this at a
different level of the stack" that is not normally necessary and has
already been done more completely and efficiently by other tools.

I think you misunderstood the main point that is that we want to overcome the bandwidth limitation of a single SSH connection for transferring _very_large_ files.

A single SSH connection as a bandwidth limitation that is either the network bandwidth or the efficiency of the cipher/MAC on the less powerfull core of the two connected endpoints.

If you traditionnaly use 1GE network cards, you will probably not see that if you have a good processorand the right cipher/mac, as the network will be the bottleneck.

If you you are using 10GE (or more) network cards, you will see the cpu limitation, and will get to your bandwidth roofline at something very far from you network capacity.

You are right, a lot of things already exist to send properly a very large number of small files over multiple SSH connections and we are already using this kind of approaches for some use cases.

However, I am not aware of anything enabling to send _very_large_ files using mutiple SSH connections. The proposed patch do that.

Give it a try, and send or receive a single 5GB file using a 10GE network and you will better see the point. If you have a solution with current ssh/scp/sftp/rsync that enables to get the most of the network (>1GB/s), then surely the patches are useless. But I am pretty sure that you will experience a bandwidth about a few hundreds MB/s at most depending on the cores involved on both sides.


And another thought - if the proposed patch and/or method indeed will not
go anywhere, would it still be helpful for you if the sftp client would
only expose the file offset functionality? That way, the complexity of
reassembly and the associated error handling doesn't enter into OpenSSH.
Re-assembly, eror handling, and delivery verification were done by
rsync ages ago. It really seems like re-inventing the wheel.

In the proposed patch, no re-assembly is necessary outside of the sftp client, as the sftp protocol was sufficiently well designed to allow read/write from/to particular remote offsets in files.

I do not see the patch as reinventing the wheel, maybe more widening it to run on widen roads.


Regards,

Matthieu

_______________________________________________
openssh-unix-dev mailing list
openssh-unix-dev@xxxxxxxxxxx
https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev


_______________________________________________
openssh-unix-dev mailing list
openssh-unix-dev@xxxxxxxxxxx
https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev




[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux