Re: [ANNOUNCE] git-sizer: compute various size-related metrics for your Git repository

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Mar 16, 2018 at 09:01:42PM +0100, Ævar Arnfjörð Bjarmason wrote:

> Suggestion for a thing to add to it, I don't have the time on the Go
> tuits:
> 
> One thing that can make repositories very pathological is if the ratio
> of trees to commits is too low.
> 
> I was dealing with a repo the other day that had several thousand files
> all in the same root directory, and no subdirectories.

We've definitely run into this problem before (CocoaPods/Specs, for
example). The metric that would hopefully show this off is "what is the
tree object with the most entries". Or possibly "what is the average
number of entries in a tree object".

That's not the _whole_ story, because the really pathological case is
when you then touch that giant tree a lot. But if you assume the paths
touched by commits are reasonably distributed over the tree, then having
a huge number of entries in one tree will also mean that more commits
will touch that tree. Sort of a vaguely quadratic problem.

Doing it at the root is obviously the worst case, but the same thing can
happen if you have "foo/bar" as a huge tree, and every single commit
needs to touch some variant of "foo/bar/baz".

That's why I suspect some "average per tree object" may show this type
of problem, because you'd have a lot of near-identical copies of that
giant tree if it's being modified a lot.

> But it's not something where you can just say having more trees is
> better, because on the other end of the spectrume we can imagine a repo
> like linux.git where each file like COPYING instead exists at
> C/O/P/Y/I/N/G, that would also be pathological.
> 
> It would be very interesting to do some tests to see what the optimal
> value would be.

I suspect there's some math that could give us the solution. You want
approximately equal-sized trees, so maybe log(N) levels?

-Peff



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux