Re: [PATCH 2/3] xdiff: -W: include immediately preceding non-empty lines in context

Vegard Nossum <vegard.nossum@xxxxxxxxxx> · Fri, 13 Jan 2017 21:20:09 +0100

On 13/01/2017 20:51, Junio C Hamano wrote:
René Scharfe <l.s.r@xxxxxx> writes:
That's true, but I'm not sure "non-empty line before function line" is
good enough a definition for desirable lines.  It wouldn't work for
people who don't believe in empty lines.  Or for those that put a
blank line between comment and function.  (I have an opinion on such
habits, but git diff should probably stay neutral.)  And that's just
for C code; I have no idea how this heuristic would hold up for other
file types like HTML.

As you are, I am fairly negative on the heuristic based on the
"non-blank" thing.  We tried once with compaction-heuristics already
and it did not quite perform well.  Let's not hardcode another one.

The patch will work as intended and as expected for 95% of the users out
there (javadoc, Doxygen, kerneldoc, etc. all have the comment
immediately preceding the function) and fixes a very real problem for me
(and I expect many others) _today_; for the remaining 5% (who put a
blank line between their comment and the start of the function) it will
revert back to the current behaviour, so there should be no regression
for them.

For the 0% who don't put even a single blank line between their
functions, it will probably not work as expected, but then again I have
never seen such a coding style in any language, so I am doubtful if this
is something that needs to be taken into account in the first place.

We can identify function lines with arbitrary precision (with a
xfuncname regex, if needed), but there is no accurate way to classify
lines as comments, or as the end of functions.  Adding optional
regexes for single- and multi-line comments would help, at least for
C.

The funcline regexp is used for two related but different purposes.
It identifies a single line to be placed on @@ ... @@ line before a
diff hunk.  This line however does not have to be at the beginning
of a function.  It has to be the line that conveys the most
significant information (e.g. the name of the function).

The way "diff -W" codepath used it as if it were always the very
first line of a function was bound to invite a patch like this, and
if we want to be extra elaborate, I agree that an extra mechanism to
say "the line the funcline regexp matches is not the beginning of a
function, but the beginning is a line that matches this other regexp
before that line" may help.

Do we really want to be that elaborate, though?  I dunno.

Adding a regex instead of the simple "blank line" test doesn't seem very
difficult to do, but I am doubtful that it will make any difference in
practice. But that can be done incrementally as well by the people who
would actually need it (who I strongly suspect do not exist in the first
place).

I wonder if it would be sufficient to make -W take an optional
number, e.g. "git show -W4", to add extre context lines before the
funcline.

I don't like specifying a fixed number, that negates almost all the
reason for using -W in the first place; I would much prefer adding
a config variable to control the -W behaviour (or a new diff flag).

Vegard