Re: misleading diff-hunk header

Jeff King <peff@xxxxxxxx> · Fri, 24 Aug 2012 12:44:15 -0400

On Fri, Aug 24, 2012 at 10:29:09AM -0400, Jeff King wrote:

> > Would this be sufficient?  Instead of looking for the first line that
> > matches the "beginning" pattern going backwards starting from one line
> > before the displayed context, we start our examination at the first line
> > shown in the context.
> > 
> >  xdiff/xemit.c |    2 +-
> >  1 files changed, 1 insertions(+), 1 deletions(-)
> > 
> > diff --git a/xdiff/xemit.c b/xdiff/xemit.c
> > index 277e2ee..5f9c0e0 100644
> > --- a/xdiff/xemit.c
> > +++ b/xdiff/xemit.c
> > @@ -131,7 +131,7 @@ int xdl_emit_diff(xdfenv_t *xe, xdchange_t *xscr, xdemitcb_t *ecb,
> >  
> >  		if (xecfg->flags & XDL_EMIT_FUNCNAMES) {
> >  			long l;
> > -			for (l = s1 - 1; l >= 0 && l > funclineprev; l--) {
> > +			for (l = s1; l >= 0 && l > funclineprev; l--) {
> >  				const char *rec;
> >  				long reclen = xdl_get_rec(&xe->xdf1, l, &rec);
> >  				long newfunclen = ff(rec, reclen, funcbuf,
> 
> In the case we were discussing then, the modified function started on
> the first line of context. But as Tim's example shows, it doesn't
> necessarily have to. I think it would make more sense to start counting
> from the first modified line.

That patch would look something like this:

diff --git a/xdiff/xemit.c b/xdiff/xemit.c
index d11dbf9..441ecf5 100644
--- a/xdiff/xemit.c
+++ b/xdiff/xemit.c
@@ -194,7 +194,7 @@ int xdl_emit_diff(xdfenv_t *xe, xdchange_t *xscr, xdemitcb_t *ecb,
 
 		if (xecfg->flags & XDL_EMIT_FUNCNAMES) {
 			get_func_line(xe, xecfg, &func_line,
-				      s1 - 1, funclineprev);
+				      xche->i1 - 1, funclineprev);
 			funclineprev = s1 - 1;
 		}
 		if (xdl_emit_hunk_hdr(s1 + 1, e1 - s1, s2 + 1, e2 - s2,

Note that this breaks a ton of tests. Some of them are just noise (e.g.,
t4042 changes line 2, so line 1 is the top of the context; before, we
would show no hunk header, since we were at the top of the file, but now
we will show line 1). Some of them are improved in the way that this
patch intends (e.g., t4051).

But some I'm not sure of. For instance, the failure in t4018.38 is odd.
I think it's because the pattern it is looking for is a somewhat odd toy
example (it's looking for a line with "s" in it, so naturally when we
shift the start-point of our search, we are likely to find some other
false positive). But it raises an interesting point: what if the pattern
is just looking for lines in a list, and not an enclosing function?

For example, imagine you have a file a list of items, one per line.
With the old code, you'd get:

	diff --git a/old b/new
	index f384549..1066a25 100644
	--- a/old
	+++ b/new
	@@ -2,3 +2,3 @@ one
	 two
	-three
	+three -- modified
	 four

So the hunk header is showing you something useful; the element just
above your context. But with my patch, you'd see:

	diff --git a/old b/new
	index f384549..1066a25 100644
	--- a/old
	+++ b/new
	@@ -2,3 +2,3 @@ two
	 two
	-three
	+three -- modified
	 four

I.e., it shows the element just before the change, which is already in
the context anyway. So it's actually less useful. Although note that the
current behavior is not all that useful, either; it is not really giving
you any information about the change, but rather just showing one extra
line of context.

So I would say that which you would prefer might depend on exactly what
you are diffing. But I would also argue that in any case where the new
code produces a worse result, the hunk header was not all that useful to
begin with.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html