Re: SCP with Resume Feature

[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

 





On 4/3/21 8:10 PM, Demi Marie Obenour wrote:
On 4/1/21 1:50 PM, rapier wrote:
Howdy all,

I know development on SCP is discouraged but being that it's still in wide use
I thought I would do some work some of my users have been asking for and allow
SCP to resume from a partial transfer.

Would it be possible to instead reimplement SCP in terms of SFTP, and then add
this feature to SFTP?  My understanding is that such a re-implementation is
something many people have wanted for quite a while.

Of course, this might very well be out of scope for the project, which would
be fine.

Honestly, after working on the SCP code I do support that idea. SCP really depends on an in band control protocol that can get out of sync and freeze the transfer process. The right thing might be to use SCP as a wrapper for SFTP. Mostly to maintain user experience and existent scripts. I may look at that depending on time and progress on other aspects of this project.

I suggest using a better hash than MD5, which is considered broken.  Blake2b is
both faster and much more secure.

I've been looking at several hashes for this: blake2, sha1, md5, and xxhash. MD5 was the first pass at implementation and I've since changed to using EVP contexts. I fully expect to go with blake2 in the end but I need to run more performance tests. The hashing ends up being one of the more expensive operations (especially on very large files (100s of MB to GB)) so that section is still subject to change.

I am trying to figure out how to reduce the number of hash operations. Let me lay it out to see if anyone has ideas (aside from using rsync - which I fully support).

Source: stat file, get hash, send control sequence 'C' to target.
	(Cfilemode filesize(s), hash(s), filename(s))
Target: Receive control sequence.
         If target exists
  		compute hash(t)
         If hash(t) == hash(s)
		skip file (send skip control sequence 'S' to source)
	 If (hash(t) != hash(s)
		send control sequence 'R' to source
                     (Rfilemode, filesize(t), hash(t))
Source: Receive control sequence from source
	If control == 'S'
		skip file
	If control == 'R'
		compute hash(r) of target file to filesize(t)
		if hash(r) == hash(t)
                	file fragments match
			mode = R (for resume)	
			bytes = filesize(s) - filesize(t)
		If hash(r) != hash(t)
			fragments do not match
			mode = C (for create)
			bytes = filesize(t)
		send control to target (mode, bytes)
Target: Receive control seq from source
	if mode == R
		write bytes to temp file
		append temp file to target
	if mode == C
		write bytes to file

I think rsync only computes hashes if the modification time, files sizes, and other file stat data is different. I thought about doing that but since you can rename the target with scp that won't work.

Anyway, if anyone has an ideas on reducing the steps, hashes. etc let me know. I also cannot figure out why I can append directly to the target file. After opening the file I'd seek to the end but the bytes would still start at the 0th byte. I'm probably missing something in atomicio. Writing the temp file and then appending works and it's not taking up a lot of cycles but it doesn't feel like the 'right' way to do it.
_______________________________________________
openssh-unix-dev mailing list
openssh-unix-dev@xxxxxxxxxxx
https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev



[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux