On Wed, Jan 27, 2021 at 05:17:07PM -0500, Jeff King wrote: > It can sometimes be useful to see which refs are contributing to the > overall repository size (e.g., does some branch have a bunch of objects > not found elsewhere in history, which indicates that deleting it would > shrink the size of a clone). > > You can find that out by generating a list of objects, getting their > sizes from cat-file, and then summing them, like: > > git rev-list --objects main..branch > cut -d' ' -f1 | I suspect that this is from the original commit message that you wrote a half-decade ago. Not that it really means much, but you could shave one process off of this example by passing '--no-object-names' to 'git rev-list'. The whole point is that we can avoid having to do this, so I don't think it really matters, anyway. > [...] > then we're faster to generate the list of objects, but we still spend a > lot of time piping and looking things up. But if we do both together: > > [internal, bitmaps] > $ time git rev-list --disk-usage --all --use-bitmap-index > 1455691059 > real 0m0.235s > user 0m0.186s > sys 0m0.049s > > then we get the same answer much faster. Very nice. > This _could_ be made more flexible, but I didn't think it was worth the > complexity. Some obvious things one might want are: > > - not counting up all reachable objects (i.e., requiring --objects for > this output, and omitting it just counts up commits). This could be > handled in the bitmap case with some extra code (OR-ing with the > type bitmaps). > > But after 5 years of this patch, I've never wanted that once. The > disk usage of just some of the objects isn't really that useful (and > of course you can still get it by piping to cat-file). Yeah. I think it's trivial to support it, but I'm in favor of a simpler interface. That said, I worry about painting ourselves into a corner if the default implies --objects. If we wanted to change that, I'm pretty sure you'd have to write a rule that says "imply objects, unless --tags, --blobs or etc. are specified, and then only do that". Maybe we'll never have to address that, but it's worth thinking about before committing to implying '--objects'. > - an option to output the sizes of specific objects along with their > oids. But if you want to get to this level of flexibility, I think > you're better off just using cat-file (and if we are concerned about > the pipe costs, we should teach rev-list to understand cat-file's > custom formats). This I agree with completely. Any caller who wants that level of flexibility shouldn't mind the piping. I have no comments on the patch itself, which looks fine to me (and I have seen over and over again as it seems to regularly cause conflicts when merging new releases into GitHub's fork :-)). Thanks, Taylor