Re: [RFC 959] FTP in ASCII mode

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



First of all thanks to everybody for the response.

I knew that a FTP transfer in ASCII mode does EOL and EOF conversions based on the OS of the receiving system. And I very much expected my UTF-8 encoded file to get garbled when I FTPied it in ASCII mode. But guess what, it was not garbled on the receiving system. Maybe I was lucky, or maybe its because UTF-8 is backward compatible with ASCII. But then, as ASCII is purely 7-bits, the FTP in ASCII mode should have corrupted the UTF-8 encoded file, because UTF-8 is 8-bits.

Moreover, in ASCII code page, code point 13=CR and code point 10=LF, but that might not be the case in every other code page. Hence the EOL conversion (in FTP ASCII mode) might corrupt that text file if it is encoded using a non-ASCII encoding. And what about handling the Unicode NewLine characters? Anyway...

After reading all the wonderful replies, my conclusion is, even though my FTP client/server handled the UTF-8 encoded text file (which BTW contained Devanagri characters) correctly, there is a possibility that a text file, encoded in an encoding other than ASCII runs a risk of being corrupted when FTPied in ASCII mode. Therefore, always use ASCII mode to transfer only ASCII encoded files, and Binary mode to transfer non-ASCII encoded files.

I was wondering why isn't there something like a "Text" mode for FTPing text files, which could handle text files encoded using any encoding available in this world, and then, the FTP client/server still does the EOL and EOF conversions properly?

Thanks,
Sandeep.


On 2/21/06, Masataka Ohta <mohta@xxxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
John C Klensin wrote:

> Sandeep's question raises another interesting issue.  I just
> went back and reread RFC 2640.   It does not seem to address the
> "TYPE A" issue at all.  It does say (Section 2, paragraph 1)
> "Clients and servers are, however, under no obligation to
> perform any conversion on the contents of a file for operations
> such as STOR or RETR", which I would take to imply that it
> anticipates I18N FTP operations to be entirely binary ("TYPE I")
> although that is not explicit.

As for Japanese processing, UTF-8 is not visible by users and on
the network, because UTF-8 is not only useless but also harmful.

Instead, ISO-2022-JP, ShiftJIS and EUC are the major character sets.
Some ftp implementations does assume (sometimes depending on environment
variables) network character code ShiftJIS or EUC and perform appropriate
conversions, which garbles UTF-8.

On the other hand, if you use ISO-2022-JP, which is 7 bit pure and ASCII
compatible (in a sense, it is pure ASCII), we can safely use ASCII mode
of vanilla ftp and there is no confusion as long as we are in ASCII
environment.

Similar encoding can be profiled using ISO 2022 to obtain a fully
internationalized, 7 bit pure, ASII compatible character encoding.

The only problem for RFC2460 was that it does not need MIME for
charset and 8bit extension that it makes it clear that MIME is
useless.

Note that long term state maintainance of full ISO 2022 is not
more complex than that of UTF-8. Note also that, carefully profiled
ISO 2022, such as ISO-2022-JP, requires state maintainance a lot
simpler than that of UTF-8.

> Whether the characters in use are UTF-8 or not, we've still got
> that issue with line-endings.

Line-ending issues of any ISO 2022 based encoding are just as simple
as those of ASCII.

                                                        Masataka Ohta



_______________________________________________
Ietf mailing list
Ietf@xxxxxxxx
https://www1.ietf.org/mailman/listinfo/ietf

_______________________________________________

Ietf@xxxxxxxx
https://www1.ietf.org/mailman/listinfo/ietf

[Index of Archives]     [IETF Annoucements]     [IETF]     [IP Storage]     [Yosemite News]     [Linux SCTP]     [Linux Newbies]     [Fedora Users]