[...] >> character. > > Erm, this seems to be a counterexample to your point. It says very > clearly that the files can use either LF or CRLF line endings, and > will be parsed correctly either way, or your parser is broken. So > pretty much any CRLF conversion rule (or none at all) will work with > such files. Perhaps I was not clear, or you did not understand my point. Read "...by translating... to #xA", XSLT output to a file therefore MUST be LF by definition for it to be canonical form. This is an example of a TEXT file that MUST by definition of the spec be LF based on all platforms. Looking at the "auto" code that exists in Git, it does not appear to support this very obvious standard, whereby for this "file-type" it should always be checked out of source control with LF regardless of how it came in. This is equivalent to the Git "input" setting I believe (?), but on a file-type basis. Yes, Git apparently does not have the notion of file-types, does it (e.g. *.xml maps to text)? The point I am really trying to make clear is that there are multiple dimensions to this problem, and not making that succinct will result in a botched attempt. We need to carefully distinguish file-types from other switches that control whether or not to perform automatic conversions. The two dimensions are eol-style and file-type. THE SWITCHES So for the switches, here is what would be meaningful to me, short, sweet: core.autocrlf :: true false core.eolstyle :: local share lf crlf If autocrlf is false, then what comes out is exactly what goes in. EOL-STYLE The eolstyle property only applies to text files (discussed later): - "local" means normalize "text" files to LF when read in, and convert to the platform preferred setting when materializing workspaces. - "share" means accept anything, but when writing files to a workspace normalize to LF (XML, XSLT, some scripting languages ...) - "lf" means always to accept anything though and convert to LF, output LF - "crlf" means to accept anything and convert to CRLF on output FILE-TYPES Linus alluded above file-types, and being explicit about them. That's great, I agree. Let me provide examples: By extension: http://www.perforce.com/perforce/doc.current/manuals/cmdref/o.ftypes.html By pathnames or extensions: http://www.perforce.com/perforce/doc.current/manuals/cmdref/typemap.html Don't beat me up for referencing other systems, please. But as people move to Git from other systems there will be some level of expectation, so understanding those perspectives and expectations so you are prepared to provide a meaningful answer would help. AUTO/TEXT-DETECTION So the above explicit definitions gets you most of the way, but what about "auto"? This is a question at the heart of convert.c, the gather_stats function that classifies among other things whether or not an input is text or binary. While gather_stats is a good start, it naively is US-centric; it most assuredly does not address UTF-8 and ISO-8859-1, both of which are VERY easy to identify, but are not presently handled by this algorithm. I wrote a simple stat gatherer for the MATLAB kernel years ago that classified the character-set of arbitrary input text to one of about a half-dozen common character-sets, so what about adding in a lightweight checker for at least UTF-8 and ISO-8859-1? I could provide such a thing back to this community if people wish. To have a little more in the gather_stats code to handle a couple more cases would go a long way and would be easy to add, and does not necessarily depend up file-type support. It would simply broaden what it means to be a text file. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html