Re: [RFC PATCH 00/18] Multi-pack index (MIDX)

Jeff King <peff@xxxxxxxx> · Wed, 10 Jan 2018 05:57:23 -0500

On Tue, Jan 09, 2018 at 02:05:09PM +0100, Johannes Schindelin wrote:

> > I think that could be easily worked around for rebase by asking git to
> > check ambiguity during the conversion.
> 
> Sure.
> 
> It also points to a flaw in your reasoning, and you should take my example
> further: previously, we guaranteed unique abbreviations, and who is to say
> that there is no script out there relying on that guarantee? You?

I'm not sure what you think my reasoning was, since I made the same
point directly after. ;)

> > I am a bit curious if there's a bounded probability that people would
> > find acceptable for Git to give an ambiguous abbreviation. We already
> > accept 1 in 2^160, of course. But would, e.g., one in a million be OK?
> 
> What is that going to solve?

I'm not sure I understand your question. We can bound the probability of
a collision by increasing the size of the abbreviation. My question was
whether there is a size where that tradeoff makes sense. So it "solves"
the problem of worrying about collisions.

> I think a better alternative would be to introduce a new abbreviation mode
> that is *intended* to stop caring about unique abbreviations.
> 
> In web interfaces, for example, it makes tons of sense to show, say, 8
> digits in link texts and have the full name in the actual link URL.

I'm not sure if that would be useful option or not. I certainly agree
that static abbreviations are a useful thing to want. But for scripted
callers, it's usually easy to cut down the abbreviation yourself. I
think your web interface example is a good one. The caller will ask Git
for the full 40-hex hash, and then cut it down itself when generating
the link (and I just so happen to have worked on a web interface that
does this[1]).

So would it be useful for humans to use? That's what I'm not sure about.
It seems cumbersome to have to add "--abbrev=8!" on the command line to
each invocation. But if it were triggered by config, it seems like we
would risk breaking scripts.

> I am just hesitant to change things that would break existing setups.

Me too. I'm not sure if I'm seriously advocating the "bound the
probability" thing. It's just something I think is worth pondering a
little.

-Peff

[1] Actually, there _is_ something that it's useful for Git to tell the
    caller, which is the approximate number of objects in the
    repository. Eight digits is not enough for the kernel, for example,
    and these days we use 12 digits by default. I'm not sure if such a
    web interface would rather ask for "%H %h" and generate the link
    text from the second half, or it would rather just get a ballpark on
    the number of objects, and do the simple abbreviation math itself
    (right now GitHub does not do either; I don't know about other
    interfaces).