Re: [PATCH 0/3] Implement filtering repacks

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Oct 20, 2022 at 01:23:02PM +0200, Christian Couder wrote:
> On Fri, Oct 14, 2022 at 6:46 PM Junio C Hamano <gitster@xxxxxxxxx> wrote:
> >
> > Christian Couder <christian.couder@xxxxxxxxx> writes:
> >
> > > For example one might want to clone with a filter to avoid too many
> > > space to be taken by some large blobs, and one might realize after
> > > some time that a number of the large blobs have still be downloaded
> > > because some old branches referencing them were checked out. In this
> > > case a filtering repack could remove some of those large blobs.
> > >
> > > Some of the comments on the patch series that John sent were related
> > > to the possible data loss and repo corruption that a filtering repack
> > > could cause. It's indeed true that it could be very dangerous, and we
> > > agree that improvements were needed in this area.
> >
> > The wish is understandable, but I do not think this gives a good UI.
> >
> > This feature is, from an end-user's point of view, very similar to
> > "git prune-packed", in that we prune data that is not necessary due
> > to redundancy.  Nobody runs "prune-packed" directly; most people are
> > even unaware of it being run on their behalf when they run "git gc".
>
> I am Ok with adding the --filter option to `git gc`, or a config
> option with a similar effect. I wonder how `git gc` should implement
> that option though.
>
> If we implement a new command called for example `git filter-packed`,
> similar to `git prune-packed`, then this new command will call `git
> pack-objects --filter=...`.

Conceptually, yes, the two are similar. Though `prune-filtered` is
necessarily going to differ in implementation from `prune-packed`, since
we will have to write new pack(s), not just delete loose objects which
appear in packs already.

So it's really not just a matter of purely deleting redundant loose
copies of objects like in the case of prune-packed. Here we really do
care about potentially writing a new set of packs to satisfy the new
filter constraint.

Presumably that tool would implement creating the new packs according to
the given --filter, and would similarly delete existing packs. That is
basically what your implementation in repack already does, so I am not
sure what the difference would be.

> Yeah. So to sum up, it looks like you are Ok with `git gc
> --filter=...`  which is fine for me, even if I wonder if `git repack
> --filter=...` could be a good first step as it is less likely to be
> used automatically (so safer in a way) and it might be better for
> implementation related performance reasons.

If we don't intend to have `git repack --filter` part of our backwards
compatibility guarantee, then I would prefer to see the implementation
just live in git-gc from start to finish.

Thanks,
Taylor



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux