--On Tuesday, 21 February, 2006 12:53 +0530 Sandeep Srivastava <sandeep.kumar.srivastava@xxxxxxxxx> wrote: > First of all thanks to everybody for the response. > > I knew that a FTP transfer in ASCII mode does EOL and EOF > conversions based on the OS of the receiving system. No, it doesn't. That was part of the point. It does no EOF conversions at all. The command and data channels were separated for several reasons, but the desire to stay out of the EOF business was an important one. And the server is required to convert whatever line-end convention it uses to CRLF, and any characters it uses to ASCII, and transmit that over the wire. If the client then converts from CRLF and ASCII to some local convention, that is its business, not that of the protocol. In other words, there are, at most, conversions to and from CRLF and ASCII. There are no FTP-specified conversions based on the properties of the receiving system. > And I > very much expected my UTF-8 encoded file to get garbled when I > FTPied it in ASCII mode. But guess what, it was not garbled on > the receiving system. Maybe I was lucky, or maybe its because > UTF-8 is backward compatible with ASCII. But then, as ASCII is > purely 7-bits, the FTP in ASCII mode should have corrupted the > UTF-8 encoded file, because UTF-8 is 8-bits. "Should have corrupted" is what I referred to as an ambiguity in my note. First of all, because of the robustness principle, you can never guarantee that bad things will happen when they might -- proper implementation of protocols around her often argues for never trashing a string because one can or because a correct string wouldn't have the problem. So, in practice, if an FTP server was implemented on an ASCII system that used the "right justified in octets" model but with LF as line-end, the authors might have well said "the character codes don't need any conversion for ASCII mode, we just need to implement conversion to CRLF". If they had done that, and UTF-8 (or ISO 8859 Latin-1 or...) were added to the system, those CCSs would go through nicely in ASCII mode, with the right line-endings. Substantially the same thing would occur, as Ohta-san points out, with many of the ISO 2022-based encodings of non-ASCII characters: completely safely with some of them and at least as safely as UTF-8 with the others although, as with UTF-8, the claim of strict ASCII would be technically false. Now that wouldn't happen with a system that was natively EBCDIC, or ASCII stored in seven bit chunks without padding, etc.: those systems would need to do real conversions to get to network ASCII and, if you thought you were getting UTF-8 over them, you would be in big trouble. > Moreover, in ASCII code page, code point 13=CR and code point > 10=LF, but that might not be the case in every other code > page. Hence the EOL conversion (in FTP ASCII mode) might > corrupt that text file if it is encoded using a non-ASCII > encoding. And what about handling the Unicode NewLine > characters? Anyway... Again, there is no conversion in the FTP protocol to local character set, only to (and, outside the protocol but common in client implementations) conversation to network ASCII with its CRLF line endings. > After reading all the wonderful replies, my conclusion is, > even though my FTP client/server handled the UTF-8 encoded > text file (which BTW contained Devanagri characters) > correctly, there is a possibility that a text file, encoded in > an encoding other than ASCII runs a risk of being corrupted > when FTPied in ASCII mode. Therefore, always use ASCII mode to > transfer only ASCII encoded files, and Binary mode to transfer > non-ASCII encoded files. Yes, that is probably wise guidance. However, if you transfer textual materials in binary (Image) mode, you also need to be sure that you have programs available on the receiving host to change line-end conventions from whatever the server uses internally to whatever the client system uses. > I was wondering why isn't there something like a "Text" mode > for FTPing text files, which could handle text files encoded > using any encoding available in this world, and then, the FTP > client/server still does the EOL and EOF conversions properly? For starters, because it would require that every FTP server support at least the several thousand coded character sets in the world. Even for end of line, there are significantly more different conventions than you seem to think there are. "Convert from whatever we use as text here to a single standard form, and then let the recipient sort out conversion from the standard form to its preferred local form" is much more plausible -- it requires the server to support one type of conversion, not thousands, and the client to support one type of conversion, not thousands. In the early 1970s, the appropriate standard form for transmission was network ASCII (including both "right justified in eight bits" and CRLF). Today, it is probably UTF-8 with CRLF (although I sympathize with Ohta-san's desire to be able to transmit 2022-based systems in canonical form) and I think we should be considering that TYPE. But ideas about universal converters make both bad protocol design and bad implementations. john _______________________________________________ Ietf@xxxxxxxx https://www1.ietf.org/mailman/listinfo/ietf