Re: [PATCH 6/9] repack: add `--filter=<filter-spec>` option

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jun 21, 2023 at 1:20 PM Taylor Blau <me@xxxxxxxxxxxx> wrote:
>
> On Thu, Jun 15, 2023 at 05:43:27PM -0700, Junio C Hamano wrote:
> > Christian Couder <christian.couder@xxxxxxxxx> writes:
> >
> > > After cloning with --filter=<filter-spec>, for example to avoid
> > > getting unneeded large files on a user machine, it's possible
> > > that some of these large files still get fetched for some reasons
> > > (like checking out old branches) over time.
> > >
> > > In this case the repo size could grow too much for no good reason and a
> > > way to filter out some objects would be useful to remove the unneeded
> > > large files.
> >
> > Makes sense.
> >
> > If we repack without these objects, when the repository has a
> > promisor remote, we should be able to rely on that remote to supply
> > them on demand, once we need them again, no?
>
> I think in theory, yes, but this patch series (at least up to this
> point) does not seem to implement that functionality by marking the
> relevant remote(s) as promisors, if they weren't already.

Yeah, it's not part of this patch series to implement all the features
that could be useful in the case of promisor remotes. This patch
series only hopes to implement a `repack --filter=...` option that can
help in a number of different use cases. I'm open to opinions about
whether or not the doc and commit messages should talk, and how much,
about use cases related to promisor remotes.

> > [...] It does smell somewhat similar to the cruft packs but not
> > really (the choice over there is between exploding to loose and
> > keeping in a pack, and never involves loss of objects).
>
> Indeed. `pack-objects`'s `--stdin-packs` and `--cruft` work similarly,
> and I believe that we could use `--stdin-packs` here instead of having
> to store the list of objects which don't meet the filter's spec. IOW, I
> think that this similarity is no coincidence...

Yeah, I agree that we could use `--stdin-packs` to implement `repack
--filter=...`. I am just not sure it's the best path forward
performance wise in the long run. So others' opinions are welcome
about that.

Also, as Junio said, this patch series is not responsible for the fact
that traverse_commit_list_filtered() stores oids into an oidset
instead of using a callback function. Fixing this would likely avoid
accumulating oids in memory. And creating a packfile by sending oids
into pack-objects is something that is already done by
repack_promisor_objects(). So even if `--filter=...` is not reusing
`--stdin-packs`, it is still reusing a lot of existing mechanisms.

Thanks,
Christian.




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux