Re: Distribution of longest common hash prefixes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



>>>>> "Linus" == Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> writes:

Linus> No yay yet.. That counts hex digits, not bits.

I thought the goal was to figure out how long (on the average) you had to give
a SHA1 to be "unique".

But even that's wrong, because of the following:

CAFEFEED357
DEADBEEF123
DEADBEEF456
DEADBEEF467
DEADBEEF478

for that sequence, I'd count 0, 8, 9, 9 when in fact, it should be 8, 9, 9, 9.
It's not the items in common with the previous value... it's the longer of the
items in common with the string on either side.  The easiest way for that
would be to use a 3-item window:

git-rev-list --objects HEAD | sort | perl -lne '
  substr($_, 40) = "";
  if (defined $p) {
    ($p ^ $_) =~ /^(\0*)/;
    $common = length $1;
    if (defined $pcommon) {
      $count[$pcommon > $common ? $pcommon : $common]++;
    }
  }
  $p = $_;
  $pcommon = $common;
  END { print "$_: $count[$_]" for 0..$#count }
'

this also fixes the bug where I compare the first line to nothing.
With this, I get (on git.git):

    0: 
    1: 
    2: 6
    3: 21153
    4: 15008
    5: 1232
    6: 90
    7: 
    8: 2

which now makes sense.  There are 2 items that need 9 hex chars
to be unique.

-- 
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<merlyn@xxxxxxxxxxxxxx> <URL:http://www.stonehenge.com/merlyn/>
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]