>>>>> "Linus" == Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> writes: Linus> No yay yet.. That counts hex digits, not bits. I thought the goal was to figure out how long (on the average) you had to give a SHA1 to be "unique". But even that's wrong, because of the following: CAFEFEED357 DEADBEEF123 DEADBEEF456 DEADBEEF467 DEADBEEF478 for that sequence, I'd count 0, 8, 9, 9 when in fact, it should be 8, 9, 9, 9. It's not the items in common with the previous value... it's the longer of the items in common with the string on either side. The easiest way for that would be to use a 3-item window: git-rev-list --objects HEAD | sort | perl -lne ' substr($_, 40) = ""; if (defined $p) { ($p ^ $_) =~ /^(\0*)/; $common = length $1; if (defined $pcommon) { $count[$pcommon > $common ? $pcommon : $common]++; } } $p = $_; $pcommon = $common; END { print "$_: $count[$_]" for 0..$#count } ' this also fixes the bug where I compare the first line to nothing. With this, I get (on git.git): 0: 1: 2: 6 3: 21153 4: 15008 5: 1232 6: 90 7: 8: 2 which now makes sense. There are 2 items that need 9 hex chars to be unique. -- Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095 <merlyn@xxxxxxxxxxxxxx> <URL:http://www.stonehenge.com/merlyn/> Perl/Unix/security consulting, Technical writing, Comedy, etc. etc. See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training! - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html