Re: easy way to demonstrate length of colliding SHA-1 prefixes?

Jeff King <peff@xxxxxxxx> · Mon, 3 Dec 2018 17:57:43 -0500

On Mon, Dec 03, 2018 at 02:30:44PM -0800, Matthew DeVore wrote:

> Here is a one-liner to do it. It is Perl line noise, so it's not very cute,
> thought that is subjective. The output shown below is for the Git project
> (not Linux) repository as I've currently synced it:
> 
> $ git rev-list --objects HEAD | sort | perl -anE 'BEGIN { $prev = ""; $long
> = "" } $n = $F[0]; for my $i (reverse 1..40) {last if $i < length($long); if
> (substr($prev, 0, $i) eq substr($n, 0, $i)) {$long = substr($prev, 0, $i);
> last} } $prev = $n; END {say $long}'

Ooh, object-collision golf.

Try:

  git cat-file --batch-all-objects --batch-check='%(objectname)'

instead of "rev-list | sort". It's _much_ faster, because it doesn't
have to actually open the objects and walk the graph.

Some versions of uniq have "-w" (including GNU, but it's definitely not
in POSIX), which lets you do:

  git cat-file --batch-all-objects --batch-check='%(objectname)' |
  uniq -cdw 7

to list all collisions of length 7 (it will show just the first item
from each group, but you can use -D to see them all).

> > You'll always need to list them all. It's inherently an operation where
> > for each SHA-1 you need to search for other ones with that prefix up to
> > a given length.
> > 
> > Perhaps you've missed that you can use --abbrev=N for this, and just
> > grep for things that are loger than that N, e.g. for linux.git:
> > 
> >      git log --oneline --abbrev=10 --pretty=format:%h |
> >      grep -E -v '^.{10}$' |
> >      perl -pe 's/^(.{10}).*/$1/'
> 
> I think the goal was to search all object hashes, not just commits. And git
> rev-list --objects will do that.

You can add "-t --raw" to see the abbreviated tree and blob names,
though it gets tricky around handling merges.

-Peff