Re: Parallel transfers with sftp (call for testing / advice)

Matthieu Hautreux <matthieu.hautreux@xxxxxx> · Sat, 9 May 2020 01:25:04 +0200

Le 06/05/2020 à 06:21, David Newall a écrit :
Did anything happen after 
https://daniel.haxx.se/blog/2010/12/08/making-sftp-transfers-fast/? I 
suspect it did, because we do now allow multiple outstanding packets, 
as well as specifying the buffer size.

Daniel explained the process that SFTP uses quite clearly, such that 
I'm not sure why re-assembly is an issue.  He explained that each 
transfer already specifies the offset within the file.  It seems 
reasonable that multiple writers would just each write to the same 
file at their various different offsets.  It relies on the target 
supporting sparse files, but supercomputers only ever run Linux ;-) 
which does do the right thing.
You are right, reassembly is not an issue, as long as you have sparse 
files support, which is our case with Linux :)

The original patch which we are discussing seemed more concerned about 
being able to connect to multiple IP addresses, rather than multiple 
connections between the same pair of machines.  The issue, as I 
understand, is that the supercomputer has slow NICs, so adding 
multiple NICs allows greater network bandwidth.  This, I think, is the 
problem to be solved; not re-assembly, just sending to what appear to 
be multiple different hosts (i.e. IP addresses.)

No, the primary goal of the patch is to enable to do that between two 
endpoints with one NIC per endpoint, the NIC being 10GE or faster.

Here is an example with roughly the same results for a single 
destination/IP :

# With the patched sftp and 1+10 parallel SSH connections

[me@france openssh-portable]$ ./sftp -n 10 germany0
Connected main channel to germany0 (1.2.3.96).
Connected channel 1 to germany0 (1.2.3.96).
Connected channel 2 to germany0 (1.2.3.96).
Connected channel 3 to germany0 (1.2.3.96).
Connected channel 4 to germany0 (1.2.3.96).
Connected channel 5 to germany0 (1.2.3.96).
Connected channel 6 to germany0 (1.2.3.96).
Connected channel 7 to germany0 (1.2.3.96).
Connected channel 8 to germany0 (1.2.3.96).
Connected channel 9 to germany0 (1.2.3.96).
Connected channel 10 to germany0 (1.2.3.96).
sftp>  get 5g 5g.bis
Fetching /files/5g to 5g.bis
/files/5g 100% 5120MB 706.7MB/s   00:07
sftp> put 5g.bis

Uploading 5g.bis to /files/5g.bis
5g.bis 100% 5120MB 664.0MB/s   00:07
sftp>

# WIth the legacy sftp :

[me@france openssh-portable]$ sftp germany0

sftp> get 5g 5g.bis
Fetching /files/5g to 5g.bis
/p/scratch/chpsadm/files/5g 100% 5120MB  82.8MB/s   01:01
sftp> put 5g.bis
Uploading 5g.bis to /files/5g.bis
5g.bis 100% 5120MB  67.0MB/s   01:16
sftp>

# With scp :

[me@france openssh-portable]$ scp 5g germany0:/files/5g.bis
5g 100% 5120MB  83.1MB/s   01:01

#With rsync :

[me@france openssh-portable]$ rsync -v 5g germany0:/files/5g.bis

5g

sent 5,370,019,908 bytes  received 35 bytes  85,920,319.09 bytes/sec
total size is 5,368,709,120  speedup is 1.00

I was curious to know why a supercomputer would have issues receiving 
at some high-bandwidth via a single NIC, while the sending machine has 
no such performance issue; but that's an aside.

Supercomputers commonly offer multiple "login nodes" and a generic DNS 
entry to connect to one of them randomly : the DNS entry is associated 
to multiple IP adresses and the client (dns resolver) selects one of them.

Other DNS entries may exist to address a particular login node, in case 
you want to go at a particular place.

When used with Cyril 's patched sftp, this logic makes that you are 
targeting multiple hosts automatically if you use the generic DNS entry 
(the first perf results of Cyril). If you select a particular host DNS 
entry (like in this exampe), then you will only contact that single host 
only.

On supercomputers, files are commonly stored on distributed file systems 
like NFS, Lustre, GPFS, ... In case your transfers target one of those 
types of file systems, you can use multiple hosts as destinations 
without any issues. You just need to ensure that the sftp sent/written 
blocks are properly sized to avoid any overwritting of some targets by 
others because of their file systems client implementations and the 
asynchronism of the various page cache flushes on the involved nodes. 
That is what is done in the patch, as explained by Cyril in a previous 
message, the block size used for parallel transfers was selected with 
that potential issue in mind.

Regards,

Matthieu

_______________________________________________
openssh-unix-dev mailing list
openssh-unix-dev@xxxxxxxxxxx
https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev

_______________________________________________
openssh-unix-dev mailing list
openssh-unix-dev@xxxxxxxxxxx
https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev