Re: [PATCH] Show binary file size change in diff --stat

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Andy Parkins wrote:
On Wednesday 2007 April 04 14:34, Rogan Dawes wrote:

Well, how about my comments in <45E67978.9030805@xxxxxxxxxxxx>,
suggesting that the edit difference (number of steps required to
transform one to the other) would be a better indication?

Perhaps.  There is certainly a difference between:

 somefile.bin  | 1000 -> 1000 bytes

and

 somefile.bin  | 500 bytes removed, 500 bytes added

I think it is better because it is consistent with what we currently do
for text files: show the number of lines added/deleted.

The thing is, "lines" is an understandable unit for a text file, so it's useful to show. I'm not sure the same is true of "bytes" for a binary file. Those bytes could represent anything; the true unit of a binary file is dependent on its type.

I think bytes are the only reasonable unit for a binary file, since we have no idea what a meaningful divisor may be. So, defaulting to the smallest possible unit (other than going to the bit-level) makes perfect sense.

For binary files, it would be consistent to show the number of bytes
added/deleted. I have not investigated the output format for the
libxdiff binary patch format, but hopefully it would not be too
difficult to calculate the deletions and additions.

I'm inclined to agree with Johannes, while it's certainly something that /could/ be shown - is it more useful? There is no guarantee that a small change in the underlying content is represented by a small change in the binary diff.

As an example: compress a file, change a byte, compress it again, perform a binary diff; what is that diff telling you about the change? (My answer is: not much).

Well, at least as much as the resulting sizes tell you, if not more.

Here is a counter example for a text file, where lines changed do not actually reflect the real changes in the file: the contents of an XML file being wrapped in an additional tag.

Semantically, all that has changed is an opening and closing tag. But, we still show that on a line by line basis, the entire file has changed (because the indentation changes). So you'd have n lines deleted, and n+2 lines added (for the additional opening and closing tag).

Andy

I still maintain that showing bytes changed is the only consistent thing to do, unless we have additional logic that allows us to do "per file-type" diff statistics. Maybe .gitattributes will allow/enable this?

Regards,

Rogan

P.S. I'm not volunteering to inflict my novice C-skills on the git community, so this is really "just my 2c"
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]