Re: Two FTP issues

John C Klensin <john-ietf@xxxxxxx> · Tue, 01 Dec 2020 13:25:11 -0500

--On Monday, November 30, 2020 22:07 -0500 "Theodore Y. Ts'o"
<tytso@xxxxxxx> wrote:

> On Mon, Nov 30, 2020 at 06:26:00PM -0500, John C Klensin wrote:
>> --On Monday, November 30, 2020 23:38 +0100 Carsten Bormann
>> <cabo@xxxxxxx> wrote:
>> 
>> > I believe a hard-earned piece of experience I took from
>> > using FTP is that you don't want to do conversions during
>> > transit.
>> 
>> I think we had figured that out by early in 1971. At least in
>> general, I think it is still true.
> 
> And yet, just recently on this mailing list, some of the people
> arguing for the inherent superiority of the ftp protocol as
> compared to http is that with ftp, it *Would* do conversions
> of text files (e.g., crlf mangling, etc.)

Perhaps one of us didn't understand Carsten's comment.  I agreed
because FTP does not do conversions in transit.  The party
requesting the file specifies the form it wants it in and the
system being asked to supply the file either supplies it in that
form... or doesn't.  No conversion "during transit" as I
understood that.

--On Tuesday, December 1, 2020 07:27 -0800 Larry Masinter
<LMM@xxxxxxx> wrote:

> At the time FTP was developed, PDP-10s with 9-bit bytes and
> IBM mainframes using EBCDIC were common, disk space was
> limited, and transfer with format conversion was necessary.

As just one example, transfer of "plain text" files, the
community had already concluded that, no matter how text was
represented on a particular host, the right solution was to have
a network-standard form in which it could be transmitted.   That
resulted in each system having to support, at most, conversion
between its format and the standard one because, otherwise, each
host would have an N-squared problem where N was the number of
distinct system architectures on the net.  Inside the various
systems, ASCII was represented in five 7 bit "bytes", with one
bit left over, on the 36 bit word of TENEX; in four 9 bit
"bytes", with two leading zero bits in each on the 36 bit work
of Multics; in the first three and last four bits of 8 bit bytes
of some systems (including ASCII on IBM mainframes); and in the
last seven bits of an 8 bit byte on others... and those are
examples, not an inclusive list.  Similarly, there were machines
that used CR as line-end, those that used LF, those that used
CRLF, those that used either a special delimiting control
character other than either of those, and those that used a
length count at the beginning of every line.  

So, even without the decision to allow TYPE E and the reasons
for it, the choice was either an N-squared problem for a
not-very-small value of N or a standard for strings in
transmission.  See RFCs 20 and 137.

Now, almost certainly there are fewer variations for "plain
text" today.  But, even it it were only "Unicode" (and it isn't:
while a large fraction of them are spam, I still routinely
receive messages identified with charset GB2312, GBK, and
ISO8859-1), there would still be the different encoding forms
and many of the same arguments would apply.

But, again, while I cannot prevent people from doing so, for
questions of whether or not the IETF has standardization work to
do, I don't think it is helpful to discuss why FTP was more or
less useful 45 years ago than it is today or whether or not it
is useful for complex file types.   And I don't think it is
useful to try to convince people who find it useful and who
understand its strengths and limitations that they should stop
using it.  The question I am trying to ask is _only_ whether
those who are using and/or supporting it would find an effort to
provide better error response information and/or provisions for
Unicode in UTF-8 useful and whether they would implement and
deploy those changes.  I don't believe a single on-list response
to those questions has been posted yet.

   john

   john