On 9/9/24 1:45 PM, Junio C Hamano wrote:
"Derrick Stolee via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes:
This is not meant to be cryptographic at all, but uniformly distributed
across the possible hash values. This creates a hash that appears
pseudorandom. There is no ability to consider similar file types as
being close to each other.
Another consideration we had when designing the current mechanism,
which is more important than "compare .c files with each other", is
to handle the case where a file is moved across directory boundary
without changing its name. These "hash collissions" are meant to be
a part of obtaining _good_ paring of blobs that ought to be similar
to each other. In other words, we wanted them to collide so that we
do not have to be negatively affected by moves.
I am not saying that we should not update the pack name hash; I am
just saying that "consider similar file types" as if that is the
most important aspect of the current hash, is misleading.
Thank you for this extra aspect, which has clarified some of my
thinking and I will use in future versions.
Thanks,
-Stolee