On Sun, Oct 07 2018, Junio C Hamano wrote: > Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> writes: > >> 1. We still have this check of objects/17/ in builtin/gc.c today. Why >> objects/17/ and not e.g. objects/00/ to go with other 000* magic such >> as the 0000000000000000000000000000000000000000 SHA-1?d Statistically >> it doesn't matter, but 17 seems like an odd thing to pick at random >> out of 00..ff, does it have any significance? > > There is no "other 000* magic such as ...". There is only one 0{40} > magic and that one must be memorable and explainable. Depending on how we're counting there's at least two. We also use 0000000000000000000000000000000000000000 as a placeholder for "couldn't read a ref" in addition or "this is a placeholder for an invalid ref" in addition to how it's used to signify creation/deletion to the in the likes of the pre-receive hook: $ echo hello > .git/refs/something $ git fsck [...] error: refs/something: invalid sha1 pointer 0000000000000000000000000000000000000000 $ > .git/refs/something $ git fsck [...] error: refs/something: invalid sha1 pointer 0000000000000000000000000000000000000000 This is because the refs backend will memzero the oid struct, and if we fail to read things it'll still be zero'd out. This manifests e.g. in this confusing fsck output, due to a bug where GitLab will write empty refs/keep-around/* refs sometimes: https://gitlab.com/gitlab-org/gitlab-ce/issues/44431 > The 1/256 sample can be any one among 256. Just like the date > string on the first line of the output to be used as the /etc/magic > signature by format-patch, it was an arbitrary choice, rather than a > random choice, and unlike 0{40} this does not have to be memorable > by general public and I do not have to explain the choice to the > general public ;-) I wanted to elaborate on the explanation for "gc.auto" in git-config. Now we just say "approximately 6700". Since this behavior has been really stable for a long time we could say we sample 1/256 of the .git/objects/?? dirs, and this explains any perceived discrepancies between the 6700 number and $(find .git/objects/?? -type f | wc -l). >> 2. It seems overly paranoid to be checking that the files in >> .git/objects/17/ look like a SHA-1. > > There is no other reason than futureproofing. We were paying cost > to open and scan the directory anyway, and checking that we only > count the loose object files was (and still is) a sensible thing to > do to allow us not even worry about the other kind of things we > might end up creating there. Makes sense. Just wanted to ask if it was that or some workaround for historical files being there.