On 9 June 2018 at 00:41, Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> wrote: > Instead of trying really hard to find an unambiguous SHA-1 we can with > core.validateAbbrev=false, and preferably combined with the new > support for relative core.abbrev values (such as +4) unconditionally > print a short SHA-1 without doing any disambiguation check. I.e. it This first paragraph read weirdly the first time. On the second attempt I knew how to structure it and got it right. It might be easier to read if the part about core.appreb=+4 were in a separate second sentence. That last "it" is "the combination of these two configs", right? > allows for picking a trade-off between performance, and the odds that > future or remote (or current and local) short SHA-1 will be ambiguous. > diff --git a/Documentation/config.txt b/Documentation/config.txt > index abf07be7b6..df31d1351f 100644 > --- a/Documentation/config.txt > +++ b/Documentation/config.txt > @@ -925,6 +925,49 @@ means to add or subtract N characters from the SHA-1 that Git would > otherwise print, this allows for producing more future-proof SHA-1s > for use within a given project, while adjusting the value for the > current approximate number of objects. > ++ > +This is especially useful in combination with the > +`core.validateAbbrev` setting, or to get more future-proof hashes to > +reference in the future in a repository whose number of objects is > +expected to grow. Maybe s/validateAbbrev/validateAbbrev = false/? > + > +core.validateAbbrev:: > + If set to false (true by default) don't do any validation when > + printing abbreviated object names to see if they're really > + unique. This makes printing objects more performant at the > + cost of potentially printing object names that aren't unique > + within the current repository. Good. I understand why I'd want to use it, and why not. > ++ > +When printing abbreviated object names Git needs to look through the > +local object store. This is an `O(log N)` operation assuming all the > +objects are in a single pack file, but `X * O(log N)` given `X` pack > +files, which can get expensive on some larger repositories. This might be very close to too much information. > ++ > +This setting changes that to `O(1)`, but with the trade-off that > +depending on the value of `core.abbrev` we may be printing abbreviated > +hashes that collide. Too see how likely this is, try running: > ++ > +----------------------------------------------------------------------------------------------------------- > +git log --all --pretty=format:%h --abbrev=4 | perl -nE 'chomp; say length' | sort | uniq -c | sort -nr > +----------------------------------------------------------------------------------------------------------- > ++ > +This shows how many commits were found at each abbreviation length. On > +linux.git in June 2018 this shows a bit more than 750,000 commits, > +with just 4 needing 11 characters to be fully abbreviated, and the > +default heuristic picks a length of 12. These last few paragraphs seem like too much to me. > ++ > +Even without `core.validateAbbrev=false` the results abbreviation > +already a bit of a probability game. They're guaranteed at the moment > +of generation, but as more objects are added, ambiguities may be > +introduced. Likewise, what's unambiguous for you may not be for > +somebody else you're communicating with, if they have their own clone. This seems more useful. > ++ > +Therefore the default of `core.validateAbbrev=true` may not save you > +in practice if you're sharing the SHA-1 or noting it now to use after > +a `git fetch`. You may be better off setting `core.abbrev` to > +e.g. `+2` to add 2 extra characters to the SHA-1, and possibly combine > +that with `core.validateAbbrev=false` to get a reasonable trade-off > +between safety and performance. Makes sense. As before, I'd suggest s/SHA-1/object ID/. I do wonder if this documentation could be a bit less verbose without sacrificing too much correctness and clarity. No brilliant suggestions on how to do that, sorry. Martin