Re: [PATCH] Add ls-files --eol-staged, --eol-worktree

Junio C Hamano <gitster@xxxxxxxxx> · Sun, 18 Oct 2015 11:32:17 -0700

Junio C Hamano <gitster@xxxxxxxxx> writes:

> If you say 1 and 2 are with LF, 4 and 6 are with CRLF, eveyrything
> else is mixed, then you are losing the distinction between 1 and 2
> (and 4 and 6) that you made when the files were a single liner (with
> or without the incomplete line ending).  Is that desirable?

Continuing this line of thought further.  My answer is "it may not
be desirable, but it is sufficient to decide what line ending to use
when I am adding a new line to the existing contents".  That is, to
a file with a single incomplete line, I can add my new line with
either LF or CRLF and the resulting one will become LF (or CRLF)
that ends in an incomplete line.  To a file whose lines consistently
use LF, I can only add my new line with LF, whether the original ends
in an incomplete line.

So from that point of view,...

> I wonder if it would be easier for the scripts that process the
> output from this command to handle if the report said what
> combination of _three_ possible line-ending is used.  i.e. does the
> file contain a line that ends with LF? does the file contain a line
> that ends with CRLF? does the file contain a line with missing EOL?

... instead of saying there are three possible line-endings, we can
stick to two, i.e. "is it text or binary" followed by "among two
possible endings, which ones are used", i.e.

    text
    text-lf
    text-crlf
    text-lf-crlf

which matches the last four lines of what you had in the "like this"
example (and I prefer "mixed" over "lf-crlf").

So I am OK with the categorization after all with respect to the
possible incomplete line at the end.  But if that is what the
feature is designed for, the documentation must say it very
clearly, i.e. "this is to allow you decide what line ending to use
when you add a new line to the existing contents" or something.

And viewed from that angle, there is no reason to special-case an
empty line.  Knowing "binary" is important because you want to be
able to say "whether LF or CRLF, you do *not* want to add your new
line to this binary file."; "empty" is just like "text"---you can
use either and get a coherent result.

So I'd suggest sticking to these classification tokens:

	binary
        text
        crlf
        mixed
        lf

The "adding my line at the beginning of the file" script can do
something like this using them (here I am simplifying by making your
feature available to "git get-eol" command that takes a single path
and does your computation):

	case "$(git get-eol file)" in
	text | lf)
        	printf "%s\n" "$mine"
                ;;
	crlf)
        	printf "%s\r\n" "$mine"
                ;;
        *) # that is 'binary' or 'mixed'
		die "do not muck with the contents of file"
                ;;
	esac
	cat file

Also this points at another direction of using the three independent
line ending conventions I suggested earlier.  What you want to
append your lines?  You would want to know if the file ends with an
incomplete line, so you would rather want to be told with a set of
categories like this instead:

	binary
	incomplete
        incomplete,crlf
        incomplete,mixed
        incomplete,lf
        crlf
        mixed
        lf

Note that an empty file will get an empty string as the grouping
above, as it does not have any line ending (i.e. no crlf/mixed/lf),
does not end with an incomplete line and is not a binary file.

And the using script would become:

	existing=$(git get-eol file)
	eol='\n'
	case ",$existing" in
	,binary | *,mixed)
		die "do not muck with the contents of file"
		;;
	,*crlf)
		eol='\r\n'
                ;;
	esac
        cat file
        case "$existing," in
        incomplete,*)
		printf "$eol"
                ;;
	esac
        printf "%s$eol" "$mine"
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html