Re: [PATCH 6/9] repack: add `--filter=<filter-spec>` option

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jun 21, 2023 at 05:04:48PM +0200, Christian Couder wrote:
> On Wed, Jun 21, 2023 at 1:20 PM Taylor Blau <me@xxxxxxxxxxxx> wrote:
> >
> > On Thu, Jun 15, 2023 at 05:43:27PM -0700, Junio C Hamano wrote:
> > > Christian Couder <christian.couder@xxxxxxxxx> writes:
> > >
> > > > After cloning with --filter=<filter-spec>, for example to avoid
> > > > getting unneeded large files on a user machine, it's possible
> > > > that some of these large files still get fetched for some reasons
> > > > (like checking out old branches) over time.
> > > >
> > > > In this case the repo size could grow too much for no good reason and a
> > > > way to filter out some objects would be useful to remove the unneeded
> > > > large files.
> > >
> > > Makes sense.
> > >
> > > If we repack without these objects, when the repository has a
> > > promisor remote, we should be able to rely on that remote to supply
> > > them on demand, once we need them again, no?
> >
> > I think in theory, yes, but this patch series (at least up to this
> > point) does not seem to implement that functionality by marking the
> > relevant remote(s) as promisors, if they weren't already.
>
> Yeah, it's not part of this patch series to implement all the features
> that could be useful in the case of promisor remotes. This patch
> series only hopes to implement a `repack --filter=...` option that can
> help in a number of different use cases. I'm open to opinions about
> whether or not the doc and commit messages should talk, and how much,
> about use cases related to promisor remotes.
>
> > > [...] It does smell somewhat similar to the cruft packs but not
> > > really (the choice over there is between exploding to loose and
> > > keeping in a pack, and never involves loss of objects).
> >
> > Indeed. `pack-objects`'s `--stdin-packs` and `--cruft` work similarly,
> > and I believe that we could use `--stdin-packs` here instead of having
> > to store the list of objects which don't meet the filter's spec. IOW, I
> > think that this similarity is no coincidence...
>
> Yeah, I agree that we could use `--stdin-packs` to implement `repack
> --filter=...`. I am just not sure it's the best path forward
> performance wise in the long run. So others' opinions are welcome
> about that.

I think it would almost certainly have comparable performance in most
cases, and significantly better performance in large repositories. IIUC,
the current system has to remember the OID of every object which did not
pass the filter, and then construct a pack containing just those
objects.

It would be nice from a memory-savings perspective to not have to
remember these OIDs. But it also just seems error prone to me to do so:
what if we lose an OID, or reorder the list?

I dunno. I feel pretty strongly that implementing this in terms of:

  - Write a filtered pack.
  - Construct the list of existing packs (marked with '-') and the
    filtered pack.
  - Pass that as input to `git pack-objects --stdin-packs`
  - If '-d' given, delete any existing pack(s).

Thanks,
Taylor



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux