On Mon, Nov 11, 2024 at 11:56:01AM +0900, Junio C Hamano wrote: > > Yes. This is the downside of the --full-name-hash compared to the > > standard name hash. When repacking an entire repository, the effect > > of these renames is typically not important in the long run as it's > > basically a single break in the delta chain. The downside comes in > > when doing a small fetch or push where the rename has more impact. > > Yes. Due to --depth limit, we need to break delta chains somewhere > anyway, and a rename boundary is just as good place as any other in > a sufficiently long chain. We don't necessarily have to break the chains due to depth limits, because they are not always linear. They can end up as bushy trees, like: A <- B <- C \ <- D <- E \ <- F We might get that way because it mirrors the shape of history (e.g., if D and E happened on a side branch from B and C). But we can also get there from a linear history by choosing a delta that balances size versus depth. In the above example, the smallest sequence might be: A <- B <- C <- D <- E <- F but if we have a limit of 3 and A-B-C already exists, then we might reject the C-D delta and choose A-D instead (or I guess if it really is linear, probably B-D is more likely). That may be a larger delta by itself, but it is still better than storing a full copy of the object. And we find it by having those candidates close together in the sorted list, reject C-D based on depth, and then moving on to the next candidate. I'm not sure in practice how often we find these kinds of deltas. If you look at, say, all the deltas for "Makefile" in git.git like this: git rev-list --objects --all --reflog --full-history -- Makefile | perl -lne 'print $1 if /(.*) Makefile/' | git cat-file --batch-check='%(objectsize:disk) %(objectname) %(deltabase)' my repo has 33 full copies (you can see non-deltas by grepping for "0{40}$" in the output) out of 4473 total. So it's not like we never break chains. But we can use graphviz to visualize it by piping the above through: perl -alne ' BEGIN { print "digraph {" } print "node_$F[1] [label=$F[0]]"; print "node_$F[1] -> node_$F[2]" if $F[2] !~ /^0+$/; END { print "}" } ' and then piping it through "dot" to make an svg, or using an interactive viewer like "xdot" (the labels are the on-disk size of each object). I see a lot of wide parts of the graph in the output. Of course this may all depend on packing patterns, too. I did my investigations after running "git repack -adf" to generate what should be a pretty reasonable pack. You might see something different from incremental repacking over time. I'm not sure what any of this means for --path-walk, of course. ;) Ultimately we care about resulting size and time to compute, so if it can do better on those metrics then it doesn't matter what the graph looks like. But maybe those tools can help us understand where things might go better (or worse) with various approaches. -Peff