On Fri, Mar 16, 2018 at 09:01:42PM +0100, Ævar Arnfjörð Bjarmason wrote: > Suggestion for a thing to add to it, I don't have the time on the Go > tuits: > > One thing that can make repositories very pathological is if the ratio > of trees to commits is too low. > > I was dealing with a repo the other day that had several thousand files > all in the same root directory, and no subdirectories. We've definitely run into this problem before (CocoaPods/Specs, for example). The metric that would hopefully show this off is "what is the tree object with the most entries". Or possibly "what is the average number of entries in a tree object". That's not the _whole_ story, because the really pathological case is when you then touch that giant tree a lot. But if you assume the paths touched by commits are reasonably distributed over the tree, then having a huge number of entries in one tree will also mean that more commits will touch that tree. Sort of a vaguely quadratic problem. Doing it at the root is obviously the worst case, but the same thing can happen if you have "foo/bar" as a huge tree, and every single commit needs to touch some variant of "foo/bar/baz". That's why I suspect some "average per tree object" may show this type of problem, because you'd have a lot of near-identical copies of that giant tree if it's being modified a lot. > But it's not something where you can just say having more trees is > better, because on the other end of the spectrume we can imagine a repo > like linux.git where each file like COPYING instead exists at > C/O/P/Y/I/N/G, that would also be pathological. > > It would be very interesting to do some tests to see what the optimal > value would be. I suspect there's some math that could give us the solution. You want approximately equal-sized trees, so maybe log(N) levels? -Peff