Re: Parallel transfers with sftp (call for testing / advice)

[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

 



OpenSSH does:

     -R num_requests
             Specify how many requests may be outstanding at any one time.
             Increasing this may slightly improve file transfer speed but will              increase memory usage.  The default is 64 outstanding requests.

As well as block size:

     -B buffer_size
             Specify the size of the buffer that sftp uses when transferring              files.  Larger buffers require fewer round trips at the cost of
             higher memory consumption.  The default is 32768 bytes.


I suspect for this to be taken seriously it will more than likely need to have threads dropped in favor of a different method.  As multiple times the OpenSSH team has rejected threads as they cause additional complexity and security issues.

Ben



Ron Frederick wrote on 5/5/20 9:42 AM:
On May 5, 2020, at 3:55 AM, Cyril Servant <cyril.servant@xxxxxxxxx> wrote:
Peter Stuge wrote:
Matthieu Hautreux wrote:
The change proposed by Cyril in sftp is a very pragmatic approach to
deal with parallelism at the file transfer level. It leverages the
already existing sftp protocol and its capability to write/read file
content at specified offsets. This enables to speed up sftp transfers
significantly by parallelizing the SSH channels used for large
transfers. This improvement is performed only by modifying the sftp
client, which is a very small modification compared to the openssh
codebase. The modification is not too complicated to review and validate
(I did it) and does not change the default behavior of the cli.
I think you make a compelling argument. I admit that I haven't
reviewed the patch, even though that is what matters the most.
If you want to review the code, here is a direct link to the patch:
https://github.com/openssh/openssh-portable/compare/V_8_2_P1...cea-hpc:V_8_2_P1_PSFTP1 <https://github.com/openssh/openssh-portable/compare/V_8_2_P1...cea-hpc:V_8_2_P1_PSFTP1>
I haven’t reviewed the patch in detail, but one thing that jumped out at me when looking at this was your choice to use multiple channels to implement the parallelism. The SFTP protocol already allows multiple outstanding reads or writes on a single channel, without waiting for the server’s response. I don’t know if the current OpenSSH client takes advantage of that or not, but it seems to me that you’d get the same benefit from that as you would from opening multiple channels with less setup cost (not having to open the additional channels, and not having to do separate file open calls on each channel). It might also make some other things like the progress bar support easier, as all of the responses would be coming back on the same channel.

You can find an example of this form of parallelism (multiple read/write requests outstanding on a single channel) in AsyncSSH’s SFTP implementation. By default, it will allow up to 128 parallel reads or writes of 16 KB each, and you can adjust both the block size and max outstanding requests via “block_size” and “max_requests” arguments in the calls to methods such as get(), put(), and copy() in the asyncssh.SFTPClient class (https://asyncssh.readthedocs.io/en/latest/api.html#sftpclient <https://asyncssh.readthedocs.io/en/latest/api.html#sftpclient>). Here’s an example client:

import asyncio, asyncssh, sys
async def run_client():
     async with asyncssh.connect('localhost') as conn:
         async with conn.start_sftp_client() as sftp:
             await sftp.get('example.txt')
try:
     asyncio.get_event_loop().run_until_complete(run_client())
except (OSError, asyncssh.Error) as exc:
     sys.exit('SFTP operation failed: ' + str(exc))


The block_size and max_requests can be added as optional arguments to the sftp.get() in this example, but parallelism will automatically be used for any file larger than 16 KB, allowing for up to 2 MB of data to be outstanding at once (128 requests of 16 KB each), without any threading and all over a single SFTP channel.

This parallel I/O support in AsyncSSH also extends to clients making large read() or write() calls on a remote file. If you call read() or write() with a length of more than the block_size set when a file is opened, these reads or write will automatically be broken into multiple parallel requests. This behavior can be disabled by setting block_size to 0 if you really need to control the exact requests sent on the channel, but by default callers will get the benefit of the extra speed.

_______________________________________________
openssh-unix-dev mailing list
openssh-unix-dev@xxxxxxxxxxx
https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev




[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux