Junio C Hamano <gitster@xxxxxxxxx> writes: > Derrick Stolee <stolee@xxxxxxxxx> writes: > >> The thing that surprised me is just how effective this is for the >> creation of large pack-files that include many versions of most >> files. The cross-path deltas have less of an effect here, and the >> benefits of avoiding name-hash collisions can be overwhelming in >> many cases. > > Yes, "make sure we notice a file F moving from directory A to B" is > inherently optimized for short span of history, i.e. a smallish push > rather than a whole history clone, where the definition of > "smallish" is that even if you create optimal delta chains, the > length of these delta chains will not exceed the "--depth" option. > > If the history you are pushing modified A/F twice, renamed it to B/F > (with or without modification at the same time), then modified B/F > twice more, you'd want to pack the 5-commit segment and having to > artificially cut the delta chain that can contain all of these 5 > blobs into two at the renaming commit is a huge loss. Which actually leads me to suspect that we probably do not even have to expose the --full-name-hash option to the end users in "git repack". If we are doing incremental that would fit within the depth setting, it is likely that we would be better off without the full-name-hash optimization, and if we are doing "repack -a" for the whole repository, especially with "-f", it would make sense to do the full-name-hash optimization. If we can tell how large a chunk of history we are packing before we actually start calling builtin/pack-objects.c:add_object_entry(), we probably should be able to even select between with and without full-name-hash automatically, but I do not think we know the object count before we finish calling add_object_entry(), so unless we are willing to compute and keep both while reading and pick between the two after we finish reading the list of objects, or something, it will require a major surgery to do so, I am afraid.