On Thu, Oct 20, 2022 at 01:23:02PM +0200, Christian Couder wrote: > On Fri, Oct 14, 2022 at 6:46 PM Junio C Hamano <gitster@xxxxxxxxx> wrote: > > > > Christian Couder <christian.couder@xxxxxxxxx> writes: > > > > > For example one might want to clone with a filter to avoid too many > > > space to be taken by some large blobs, and one might realize after > > > some time that a number of the large blobs have still be downloaded > > > because some old branches referencing them were checked out. In this > > > case a filtering repack could remove some of those large blobs. > > > > > > Some of the comments on the patch series that John sent were related > > > to the possible data loss and repo corruption that a filtering repack > > > could cause. It's indeed true that it could be very dangerous, and we > > > agree that improvements were needed in this area. > > > > The wish is understandable, but I do not think this gives a good UI. > > > > This feature is, from an end-user's point of view, very similar to > > "git prune-packed", in that we prune data that is not necessary due > > to redundancy. Nobody runs "prune-packed" directly; most people are > > even unaware of it being run on their behalf when they run "git gc". > > I am Ok with adding the --filter option to `git gc`, or a config > option with a similar effect. I wonder how `git gc` should implement > that option though. > > If we implement a new command called for example `git filter-packed`, > similar to `git prune-packed`, then this new command will call `git > pack-objects --filter=...`. Conceptually, yes, the two are similar. Though `prune-filtered` is necessarily going to differ in implementation from `prune-packed`, since we will have to write new pack(s), not just delete loose objects which appear in packs already. So it's really not just a matter of purely deleting redundant loose copies of objects like in the case of prune-packed. Here we really do care about potentially writing a new set of packs to satisfy the new filter constraint. Presumably that tool would implement creating the new packs according to the given --filter, and would similarly delete existing packs. That is basically what your implementation in repack already does, so I am not sure what the difference would be. > Yeah. So to sum up, it looks like you are Ok with `git gc > --filter=...` which is fine for me, even if I wonder if `git repack > --filter=...` could be a good first step as it is less likely to be > used automatically (so safer in a way) and it might be better for > implementation related performance reasons. If we don't intend to have `git repack --filter` part of our backwards compatibility guarantee, then I would prefer to see the implementation just live in git-gc from start to finish. Thanks, Taylor