--On Monday, November 30, 2020 22:07 -0500 "Theodore Y. Ts'o" <tytso@xxxxxxx> wrote: > On Mon, Nov 30, 2020 at 06:26:00PM -0500, John C Klensin wrote: >> --On Monday, November 30, 2020 23:38 +0100 Carsten Bormann >> <cabo@xxxxxxx> wrote: >> >> > I believe a hard-earned piece of experience I took from >> > using FTP is that you don't want to do conversions during >> > transit. >> >> I think we had figured that out by early in 1971. At least in >> general, I think it is still true. > > And yet, just recently on this mailing list, some of the people > arguing for the inherent superiority of the ftp protocol as > compared to http is that with ftp, it *Would* do conversions > of text files (e.g., crlf mangling, etc.) Perhaps one of us didn't understand Carsten's comment. I agreed because FTP does not do conversions in transit. The party requesting the file specifies the form it wants it in and the system being asked to supply the file either supplies it in that form... or doesn't. No conversion "during transit" as I understood that. --On Tuesday, December 1, 2020 07:27 -0800 Larry Masinter <LMM@xxxxxxx> wrote: > At the time FTP was developed, PDP-10s with 9-bit bytes and > IBM mainframes using EBCDIC were common, disk space was > limited, and transfer with format conversion was necessary. As just one example, transfer of "plain text" files, the community had already concluded that, no matter how text was represented on a particular host, the right solution was to have a network-standard form in which it could be transmitted. That resulted in each system having to support, at most, conversion between its format and the standard one because, otherwise, each host would have an N-squared problem where N was the number of distinct system architectures on the net. Inside the various systems, ASCII was represented in five 7 bit "bytes", with one bit left over, on the 36 bit word of TENEX; in four 9 bit "bytes", with two leading zero bits in each on the 36 bit work of Multics; in the first three and last four bits of 8 bit bytes of some systems (including ASCII on IBM mainframes); and in the last seven bits of an 8 bit byte on others... and those are examples, not an inclusive list. Similarly, there were machines that used CR as line-end, those that used LF, those that used CRLF, those that used either a special delimiting control character other than either of those, and those that used a length count at the beginning of every line. So, even without the decision to allow TYPE E and the reasons for it, the choice was either an N-squared problem for a not-very-small value of N or a standard for strings in transmission. See RFCs 20 and 137. Now, almost certainly there are fewer variations for "plain text" today. But, even it it were only "Unicode" (and it isn't: while a large fraction of them are spam, I still routinely receive messages identified with charset GB2312, GBK, and ISO8859-1), there would still be the different encoding forms and many of the same arguments would apply. But, again, while I cannot prevent people from doing so, for questions of whether or not the IETF has standardization work to do, I don't think it is helpful to discuss why FTP was more or less useful 45 years ago than it is today or whether or not it is useful for complex file types. And I don't think it is useful to try to convince people who find it useful and who understand its strengths and limitations that they should stop using it. The question I am trying to ask is _only_ whether those who are using and/or supporting it would find an effort to provide better error response information and/or provisions for Unicode in UTF-8 useful and whether they would implement and deploy those changes. I don't believe a single on-list response to those questions has been posted yet. john john