Re: [PATCH 2/2] Add keyword unexpansion support to convert.c

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Tue, 17 Apr 2007, J. Bruce Fields wrote:
> 
> I've occasionally wondered before whether git could offer any help in
> the case where, say, somebody hands me a file, I know it's based on
> src/widget/widget.c from somewhere in v0.5..v0.7, and I'd like a guess
> at the most likely candidates.

It's actually fairly easy to do.

Get the git hash of the blob: use "git hash-object" to do so (although 
you can do it without git too, see later), then just do

	git whatchanged v0.5..v0.7 -- src/widget/widget.c

and just look for the hash. If it's an exact match, you'd find it there, 
and it will tell you when it changed.

If it's *not* an exact match, you have to come up with some "measure of 
minimality" for the thing (the size of the diff might be a good one), and 
you can do

	git rev-list --no-merges --full-history v0.5..v0.7 -- src/widget/widget.c > rev-list

which will get you a full set of commits that changed that file. Then you 
can just do something like

	best_commit=none
	best=1000000
	while read commit
	do 
		git cat-file blob "$commit:src/widget/widget.c" > tmpfile
		lines=$(diff reference-file tmpfile | wc -l)
		if [ "$lines" -lt "$best" ]
		then
			echo Best so far: $commit $lines
			best=$lines
		fi
	done < rev-list

and you're done!

(Yeah, I'm sure that script could be improved, but it's probably really 
not that bad even as-is! The initial "git rev-list" will have done all 
the heavy lifting, and picked out the commits that matter)

> I haven't wondered that often enough that I'd consider it worth
> embedding the blob SHA1 in every checked-out file, though!

It really doesn't pay.

Besides, if you actually have the file, you can trivially get the SHA1 
_without_ embedding it into the file. Just do

	(echo -e -n "blob <size>\0" ; cat file) | sha1sum

where "size" is just the size in bytes of the file.

So embedding the SHA1 doesn't actually buy you anything: every blob BY 
DEFINITION has their SHA1 embedded into them.

In fact, embedding the SHA1 (or doing any other modifications) just makes 
it harder to do this, since then you have to filter it out again.

		Linus
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]