Re: [PATCH 6/9] repack: add `--filter=<filter-spec>` option

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jun 21, 2023 at 6:53 PM Junio C Hamano <gitster@xxxxxxxxx> wrote:
>
> Christian Couder <christian.couder@xxxxxxxxx> writes:
>
> > It might indicate that we prefer to be safe, do things in different
> > steps and not provide an easy way for users to shoot their own foot.
> > For example it seems pretty safe to do things like this:
> >
> >   1) put all the objects we think should be on the promisor remote in
> > a separate packfile
> >   2) start checking that each object in that packfile is available on
> > the promisor remote
> >   3) if an object in that packfile isn't on the promisor remote, try
> > to send it there
> >   4) if we couldn't send the object, error out
> >   5) if we haven't errored out after checking all the objects in the
> > packfile, it means all these objects are now available from the
> > promisor remote and we can safely delete the packfile
>
> I may be missing something, but to me, the above sound more like a
> tail wagging the dog.
>
> Instead of saying "while repacking, we'll create the new pack with
> the objects that we suspect that we cannot re-fetch from the
> promisor (allowing false positives for safety), and store the rest
> in a backup pack (that can immediately be discarded)",

This might be a good idea, but what if users prefer to send to a
promisor remote the objects that should be on that promisor remote as
soon as possible, instead of keeping them on the local machine where
they take up possibly valuable space for no good reason?

My point is that there are a lot of different strategies that people
operating with a promisor remote could adopt, so it's better to
iteratively give them building blocks that can help them instead of
trying to find and implement right away the best solution for a
special use case or for every use case.

> the above
> says "while repacking, we'll create the new pack with objects that
> match the filter, and store the rest to another pack".  But because
> the object selection criteria used in the latter is not something
> with practical/useful meaning, in other words, it does not exactly
> match what we want,

"What we want" depends on the strategy chosen to manage objects on
promisor remotes and I am not sure that the strategy you mention is
always better than the example strategy I talked about. For some users
it might be better for others it might not.

> we fill the gaps between what we want (i.e. sift
> the objects into "refetchable" and "other" bins) and what we
> happened to have implemented (i.e. sift the objects into "match
> filter" and "other" bints) by sending the objects that we _should_
> have included in the new pack (i.e. "not refetchable") to the
> promisor to make them refetchable.
>
> I do not know what to think about that.  I do not think there is
> even a way to guarantee that the push done for 3) will always be
> taken and still leave the resulting promisor usable (e.g.  we can
> make them connected by coming up with a random new ref to point
> these "we are sending these only because we failed to include them
> in the set of objects we should consider local" objects, but then
> how would we avoid bloating the refs at the promisor remote side
> (which now has become a "dumping ground", rather than holding the
> objects needed for histories that project participants care about).

There are some configurations where users never want to delete any git
object. In those cases it doesn't matter if the promisor remote is a
"dumping ground". Users might just want a promisor remote to keep all
the large files that have ever been pushed into the repo, to save more
precious space on the machine hosting the regular repo.

There are configurations where users can have garanties that the push
done for 3) will work with a very high probability so that the example
strategy I talked about can work reliably enough.

The example strategy I talked about is just one example where having
repack --filter work like it does in this patch series can be useful
and safe. I don't pretend that it is always the best strategy and that
some users might not prefer another better strategy for them. If
that's the case perhaps they can implement another different option
that just checks that filtered out objects are indeed available from a
promisor remote and then just omit these objects from the resulting
pack. In fact I would have nothing against such an option, and I might
even implement it myself one day (no promise though).

Right now they have nearly no helpful command (except perhaps using
pack-objects directly), so even if this is not the best possible help
in all use cases, I am just saying that this can be useful in _some_
cases.

> As an argument to salvage this series as (one of the possible
> ingredients to) a solution to "slim down a bloated lazy clone"
> problem, it sounds a bit weak.

I don't quite agree, but anyway I will use this argument less in
version 2 of this patch series, and I will talk more about the
argument that `repack --filter=...` allows users to put packfiles
containing some kinds of objects (like large blobs or all the blobs)
on cheaper disks (when not using a promisor remote).

Thanks.




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux