Re: FTP and file transfers

Phillip Hallam-Baker <phill@xxxxxxxxxxxxxxx> · Sat, 7 Oct 2017 18:58:28 -0400

On Fri, Oct 6, 2017 at 11:23 PM, Keith Moore <moore@xxxxxxxxxxxxxxxxxxxx> wrote:

    There are still a number of important edge cases for which FTP is
      superior to any other widely available protocol - wildcard
      transfers of multiple files, text file transfers between systems
      with different character encoding conventions, 3rd party mediated
      transfers (used regularly in the broadcast TV industry where
      having system A control moving of content from B to C is exactly
      what is needed).

      However FTP does look a bit antiquated by now - what with its
      support for file and record types that are almost (but not quite)
      entirely nonexistent on modern systems; a lot of implementations
      sadly never figured out how to make it work through NAT [*] (or a
      lot of ALGs in NAT didn't work right); and I have a hard time
      recommending for widespread use any protocol that doesn't have
      encryption as an ordinary, widely-implemented feature.
Exactly how I feel. Yes, it was state of the art in its day. It is not state of the art now. I would like something better and as you point out, the alternatives are not exactly great.

I blame you for the following Keith. It occurred to me that there is a much better solution and I already have the pieces.

These days, I don't actually do much actual 'file transfer' so much as want to synchronize two directories. Often what I would like is to achieve an instantaneous cut over so that one microsecond, the contents of the directory is { A, B, C } and the next it is { A, C', D } and no inconsistency occurs between those two points.

It is the same with distribution of software updates. I have to wait 5 minutes while the machine downloads and applies an update? REALLY? How quaint.

This week I have been playing with a new container format that has the novel idea of having a length indicator at the front and the back of every data frame. This means that it is just as quick to read frames backwards from the end as it is forwards from the start. And then I added a binary tree and a Merkel tree for good measure.

The basic idea is similar to a Zip file, you write the file out sequentially and then drop the index on the end when complete. Only with this scheme you can drop a series of incremental indexes as you go along. Upshot is that you can extract any record on arbitrary search criteria in log2(n) time without startup overhead. 

So one reason for doing this is to support efficient mail message etc stores even with end to end encryption.

But another purpose is to enable two log files to be synchronized. Updates written to one are appended to the other. 

What I didn't think of but would be entirely logical, is a scheme in which the container file I write out does not have any data in it, that is written out to the file system instead. I could then use the container file as the basis for synchronizing two directories on different machines.