Re: [RFC 959] FTP in ASCII mode

John C Klensin <john@xxxxxxx> · Mon, 20 Feb 2006 12:40:10 -0500

--On Monday, 20 February, 2006 17:25 +0100 Peter Dambier
<peter@xxxxxxxxxxxxxxxx> wrote:

> Once upon a time the used to be computers speaking ASCII or
> EBCDIC. The ASCII computers where unix mostly.
>...

Actually, Peter, at the time FTP was designed, the "ASCII
computers" were mostly PDP-10s running Tenex or ITS, plus less
than a handful of Multics machines.  Unix wasn't really a major
presence on the network yet.   And that is important because the
PDP-10 and Multics machines were both 36-bit environments,
neither of which stored 7bit ASCII in octets.  The PDP-10
normally used five characters per word with the ASCII in 7 bits
(a very hard environment for UTF-8 or even "Latin-1") and
Multics normally stored ASCII right-justified in a nine-bit
field with the two leading bits set to zero).   So "convert to
network ASCII" was a non-trivial operation for almost everyone
until ASCII-native 32 bit machines started to become prominent
on the network -- among other things we needed it to get back
and forth between Multics, ITS, and Tenex systems all of which
were more or less ASCII-based.

You are correct about the EBCDIC character conversions.  I've
lose my memory about the state of "virtual card decks" at the
time, but my vague recollection is that text files of the type
that were likely to be transmitted over the network were at
least as likely to stored in files with variable length records
(lengths determined by counts, rather than character-delimiters)
than as fixed-length 80 (or 72) records.

> Or you could print it directly on an ASR-33 terminal.

My (also vague) recollection is that the ASR-33 terminal was
upper-case-only and hence could not fully support ASCII.  I do
remember (fondly except for the racket and speed) some KSR-38s
and maybe ASR-38s that were ASCII devices.

> As the ASR-33 terminal did not know UTF-8 it is not a good
> idea to use ASCII mode for UTF-8. But you can send it only to
> systems who understand UTF-8.

Sandeep's question raises another interesting issue.  I just
went back and reread RFC 2640.   It does not seem to address the
"TYPE A" issue at all.  It does say (Section 2, paragraph 1)
"Clients and servers are, however, under no obligation to
perform any conversion on the contents of a file for operations
such as STOR or RETR", which I would take to imply that it
anticipates I18N FTP operations to be entirely binary ("TYPE I")
although that is not explicit.

Whether the characters in use are UTF-8 or not, we've still got
that issue with line-endings.   

Based on the many things we have learned about
internationalization in the last half-dozen years --
demonstrated by the changes from Unicode 2.0 through 5.x,
introduction of IDNA, the "LTRU" work on language tagging, a
growing feeling in some quarters that RFC 2277 may have been a
bit naive in some ways-- it is probably time to revisit 2640.
Not only may some of its requirements not be quite right, but it
may be time to invent a "TYPE U" that would process and transmit
UTF-8 on the same basis that RFC 959's "TYPE A" does for ASCII:
mandatory conversion to that form if needed and CRLF
line-endings.

     john

_______________________________________________

Ietf@xxxxxxxx
https://www1.ietf.org/mailman/listinfo/ietf