Re: [RFC/PATCH] core.abbrev doc: document and test the abbreviation length

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Ævar Arnfjörð Bjarmason  <avarab@xxxxxxxxx> writes:

> +The algorithm to pick the the current abbreviation length is
> +considered an implementation detail, and might be changed in the
> +future. Since Git version 2.11, the length has been configured to
> +auto-scale based on the estimated number of objects in the
> +repository. We pick a length such that if all objects in the
> +repository were abbreviated, we'd have a 50% chance of a *single*
> +collision.

Correct and reads well.

> +For example, with 2^14-1 is the last object count at which we'll pick
> +a short length of "7", and will roll over to "8" once we have one more
> +object at 2^14. Since each hexdigit we add (4 bits) allows us to have
> +four times (2 bits) as many objects in the repository

Something is missing at this point in the sentence. 

	"without raising the chance of a single collision higher"

or something like that.

> , we'll roll over
> +to a length of "9" at 2^16 objects, "10" at 2^18 etc.

Correct and reads well.

> We'll never
> +automatically pick a length less than "7", which effectively hardcodes
> +2^12 as the minimum number of objects in a repository we'll consider
> +when choosing the abbreviation length.

This may be technicaly correct, but to me, it seems to place stress
on the wrong side of the equation.  Since nobody would find "Ah, so
I can create up to 2^12 objects without fearing that my abbreviated
object name would become longer than 7", I do not see much point in
saying "hardcoded floor for the number of objects".

On the other hand, saying that 7 is the hardcoded floor for the
abbreviation length does make sense, as those adept at math after
reading the paragraph up to this point would wonder why their tiny
repository still uses 7 hexdigits, which is way too many to ensure
the low collision rate for the size of their toy repository.

	We do not use abbreviation shorter than 7 hexdigits by default,
	so a small repository with less than 2^12 objects may have even
	smaller chance than 50% to have a single collision.

may be an improvement.




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux