Re: [PATCH (amend)] diff: Make numstat machine friendly also for renames (and copies)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 11 Dec 2007, Junio C Hamano wrote:
> Jakub Narebski <jnareb@xxxxxxxxx> writes:
>> Junio C Hamano wrote:
>>> Jakub Narebski <jnareb@xxxxxxxxx> writes:
>>> 
>>>> Previous version of this patch (from 7 May 2007) used instead of current
>>>> only "to_name" format similar to git-diff-tree raw format for renames:
>>>>
>>>>   added deleted TAB path for "src" TAB path for "dst" LF
> 
> That's perfectly fine (that is without -z).
> 
>>>> ... Previous version
>>>> of patch used (or was to use actually, because of error in the code)
>>>>
>>>>   added deleted TAB path for "src" NUL NUL path for "dst" NUL
>>>>
>>>> when -z option was used.
> 
> I think your "previous" one:
> 
> 	<added> <deleted> <src> NUL		  (no rename)
> 	<added> <deleted> <src> NUL NUL <dst> NUL (with rename)
> 
> would be unambiguously parsable, but it would be cleaner and easier for
> the parser if the latter were like this instead:
> 
> 	<added> <deleted> NUL <src> NUL <dst> NUL (with rename)
> 
> The reader can expect to find a NUL terminated src, finds the length is
> zero and notices it cannot be src and then reads two paths from that
> point.

Very good idea.

> Without having the status letter, we cannot do "optional two paths"
> without breaking existing --numstat -z readers that do not care about
> renames and copies, and I agree that existing --numstat -z readers that
> pass -M or -C are already broken, so I think a format extension along
> the above line (your "previous" and the above clean-up proposal have the
> same power of expressiveness, I just think the latter is syntactically
> cleaner) would be reasonable.  And at that point we probably would not
> need --numstat-enhanced at all ;-)

The previous patch is a fix (usability fix) for numstat output in the
presence of rename detection, making it truly machine friendly. Moreover
it should not break any scripts which parsed numstat output not
expecting to deal with renames (for example if repository has
diff.renames config option set), and might even fix them.

The proposed - and implemented in patch below - change could break some
scripts not expecting to deal with new numstat output. Adding status
letter (with similarity/dissimilarity percentage info) would
unfortunately require greater surgery... We can either allow scripts
(do there exist any?) to break, or make the full rename info shown
only for --numstat-extended; but I'd like to have this status letter
for --numstat-extended / --numstat-enhanced.

So possible solution are:
 (a) Accept both patches, and accept remote possibility that some
     scripts might break
 (b) Add --numstat-extended option whoich would show rename info
     as implemented in this patch (and of course accept previous)
 (c) Implement --numstat-extended with status letter, as suggested,
     accept first patch but not this one.


Note that gitweb would require --numstat with proper rename detection
support if we want to e.g. provide graphical diffstat in 'commit' view
for merge commits.


P.S. The numstat output format for renames should be probably described
in documentation, and not only in commit message, but I was not sure
where to put it (and even how it should be written).

-- >8 --
From: Jakub Narebski <jnareb@xxxxxxxxx>
Date: Tue, 11 Dec 2007 23:52:18 +0100
Subject: [RFC/PATCH] diff: Show rename info in numstat output, correctly

Make numstat output of git-diff show both source and destination
filename for renames (and copies) in easily parseable format, similar
to raw diff output format.

The numstat format for rename (or copy) is now

  <added> <deleted> TAB <path for "src"> TAB <path for "dst"> LF

or if -z option is used

  <added> <deleted> TAB NULL <path for "src"> NULL <path for "dst"> NULL

where <added> and <deleted> is number of added/deleted lines
(or '-' for binary file).

When -z option is not used, TAB, LF, and backslash characters in
pathnames are represented as \t, \n, and \\, respectively.


Idea for -z numstat rename format by Junio C Hamano.

Signed-off-by: Jakub Narebski <jnareb@xxxxxxxxx>
---
 diff.c |    9 ++++++++-
 1 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/diff.c b/diff.c
index 8039ac7..56cbf28 100644
--- a/diff.c
+++ b/diff.c
@@ -991,7 +991,14 @@ static void show_numstat(struct diffstat_t* data, struct diff_options *options)
 			printf("-\t-\t");
 		else
 			printf("%d\t%d\t", file->added, file->deleted);
-		write_name_quoted(file->name, stdout, options->line_termination);
+		if (file->is_renamed) {
+			char sep = options->line_termination ? '\t' : '\0';
+			if (!options->line_termination)
+				putchar(options->line_termination);
+			write_name_quoted(file->from_name, stdout, sep);
+			write_name_quoted(file->name, stdout, options->line_termination);
+		} else
+			write_name_quoted(file->name, stdout, options->line_termination);
 	}
 }
 
-- 
1.5.3.7

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux