Re: [PATCH 2/2] rev-list: add --disk-usage option for calculating disk usage

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jan 27, 2021 at 05:17:07PM -0500, Jeff King wrote:
> It can sometimes be useful to see which refs are contributing to the
> overall repository size (e.g., does some branch have a bunch of objects
> not found elsewhere in history, which indicates that deleting it would
> shrink the size of a clone).
>
> You can find that out by generating a list of objects, getting their
> sizes from cat-file, and then summing them, like:
>
>     git rev-list --objects main..branch
>     cut -d' ' -f1 |

I suspect that this is from the original commit message that you wrote a
half-decade ago. Not that it really means much, but you could shave one
process off of this example by passing '--no-object-names' to 'git
rev-list'.

The whole point is that we can avoid having to do this, so I don't think
it really matters, anyway.

> [...]
> then we're faster to generate the list of objects, but we still spend a
> lot of time piping and looking things up. But if we do both together:
>
>   [internal, bitmaps]
>   $ time git rev-list --disk-usage --all --use-bitmap-index
>   1455691059
>   real	0m0.235s
>   user	0m0.186s
>   sys	0m0.049s
>
> then we get the same answer much faster.

Very nice.

> This _could_ be made more flexible, but I didn't think it was worth the
> complexity. Some obvious things one might want are:
>
>   - not counting up all reachable objects (i.e., requiring --objects for
>     this output, and omitting it just counts up commits). This could be
>     handled in the bitmap case with some extra code (OR-ing with the
>     type bitmaps).
>
>     But after 5 years of this patch, I've never wanted that once. The
>     disk usage of just some of the objects isn't really that useful (and
>     of course you can still get it by piping to cat-file).

Yeah. I think it's trivial to support it, but I'm in favor of a simpler
interface.

That said, I worry about painting ourselves into a corner if the default
implies --objects. If we wanted to change that, I'm pretty sure you'd
have to write a rule that says "imply objects, unless --tags, --blobs or
etc. are specified, and then only do that".

Maybe we'll never have to address that, but it's worth thinking about
before committing to implying '--objects'.

>   - an option to output the sizes of specific objects along with their
>     oids. But if you want to get to this level of flexibility, I think
>     you're better off just using cat-file (and if we are concerned about
>     the pipe costs, we should teach rev-list to understand cat-file's
>     custom formats).

This I agree with completely. Any caller who wants that level of
flexibility shouldn't mind the piping.

I have no comments on the patch itself, which looks fine to me (and I
have seen over and over again as it seems to regularly cause conflicts
when merging new releases into GitHub's fork :-)).

Thanks,
Taylor



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux