Re: [PATCH 20/20] abbrev: add a core.validateAbbrev setting

Junio C Hamano <gitster@xxxxxxxxx> · Tue, 12 Jun 2018 11:58:37 -0700

Martin Ågren <martin.agren@xxxxxxxxx> writes:

>> +This is especially useful in combination with the
>> +`core.validateAbbrev` setting, or to get more future-proof hashes to
>> +reference in the future in a repository whose number of objects is
>> +expected to grow.
>
> Maybe s/validateAbbrev/validateAbbrev = false/?

Perhaps, but even with =true it would equally be useful, as the
point of this setting is to future-proofing.

>> ++
>> +When printing abbreviated object names Git needs to look through the
>> +local object store. This is an `O(log N)` operation assuming all the
>> +objects are in a single pack file, but `X * O(log N)` given `X` pack
>> +files, which can get expensive on some larger repositories.
>
> This might be very close to too much information.

Not very close, but just too much information without crucial detail
(i.e. log N times what constant???).  I'd drop it.

>> ++
>> +This setting changes that to `O(1)`, but with the trade-off that
>> +depending on the value of `core.abbrev` we may be printing abbreviated
>> +hashes that collide.

It may not be technically wrong to say "This changes it to O(1)",
but I think to most people it is more understandable to say "This
changes it to zero" ;-)

    Setting this variable to false makes Git not to validate the
    abbreviation it produces uniquely identifies an object among the
    current set of objects in the repository.  Depending on the
    value of `core.abbrev`, we may be printing abbreviated hashes
    that collide.  Note that setting this variable to true (or
    leaving it unset) does not guarantee that an abbreviated hash
    will never collide with future objects in the repository (you
    need to set core.abbrevLength to a larger value for that).

would be sufficient to clarify, and also nuke the following
overly-detailed paragraph.

>> ... Too see how likely this is, try running:
>> ++
>> +-----------------------------------------------------------------------------------------------------------
>> +git log --all --pretty=format:%h --abbrev=4 | perl -nE 'chomp; say length' | sort | uniq -c | sort -nr
>> +-----------------------------------------------------------------------------------------------------------
>> ++
>> +This shows how many commits were found at each abbreviation length. On
>> +linux.git in June 2018 this shows a bit more than 750,000 commits,
>> +with just 4 needing 11 characters to be fully abbreviated, and the
>> +default heuristic picks a length of 12.
>
> These last few paragraphs seem like too much to me.

Yeah, it goes to too low level a detail, especially with the "try
running" part.  I'd remove everything but "On linux.git in June ..."
if I were writing it from the above.

>> ++
>> +Even without `core.validateAbbrev=false` the results abbreviation
>> +already a bit of a probability game. They're guaranteed at the moment
>> +of generation, but as more objects are added, ambiguities may be
>> +introduced. Likewise, what's unambiguous for you may not be for
>> +somebody else you're communicating with, if they have their own clone.
>
> This seems more useful.

Yes, but still overly verbose; I think rolling it in the single
paragraph description like I showed above would be sufficient.

>> ++
>> +Therefore the default of `core.validateAbbrev=true` may not save you
>> +in practice if you're sharing the SHA-1 or noting it now to use after
>> +a `git fetch`. You may be better off setting `core.abbrev` to
>> +e.g. `+2` to add 2 extra characters to the SHA-1, and possibly combine
>> +that with `core.validateAbbrev=false` to get a reasonable trade-off
>> +between safety and performance.
>
> Makes sense. As before, I'd suggest s/SHA-1/object ID/.

Likewise.  If we were to keep it, then s/object ID/object name/.