Re: [PATCH v2] rev-list --disk-usage

Junio C Hamano <gitster@xxxxxxxxx> · Wed, 10 Feb 2021 08:31:08 -0800

Jeff King <peff@xxxxxxxx> writes:

> But in practice, we've found this kind of naive --disk-usage useful for
> answering questions like:
>
>   - do I need all of these objects? Comparing "rev-list --disk-usage
>     --objects --all", "rev-list --disk-usage --objects --all --reflog",
>     and "du objects/pack/*.pack" will tell you if a prune/repack might
>     help, and whether expiring reflogs makes a difference.
>
>   - the size of the shared alternates repo for a set of forks has
>     jumped. Comparing "rev-list --disk-usage --objects --remotes=$base
>     --not --remotes=$fork" will tell you what's reachable from a fork
>     but not from the base (we use "refs/remotes/$id/*" to keep track of
>     fork refs in our alternates repo). This can be junk like somebody
>     forking git/git and then uploading a bunch of pirated video files.
>     :)
>
>   - likewise, the size of cloning a single repo may jump. Comparing
>     "rev-list --disk-usage --objects HEAD..$branch" for each branch
>     might show that one branch is an outlier (e.g., because somebody
>     accidentally committed a bunch of build artifacts).
>
> In those kinds of cases, it's not usually "oh, this version is twice as
> big as this other one". It's more like "wow, this branch is 100x as big
> as the other branches", and little decisions like delta direction are
> just noise. I imagine that in those cases the uncompressed object sizes
> would probably produce similar patterns and answers. But it's actually
> faster to produce the on-disk sizes. :)

Thanks.

I kind of feel sad to have a nice write-up like this only in the
list archive.  Is there a section in our documentation set to keep
collection of such a real-life use cases?  Perhaps the examples
section of manpages is the closest thing, but it looks a bit too
narrowly scoped for the example section of "rev-list" manpage.

THanks.