Re: [PATCH v2 0/4] [RFC] repack: add --filter=

Taylor Blau <me@xxxxxxxxxxxx> · Tue, 22 Feb 2022 12:33:04 -0500

Hi Christian,

I think my objections may be based on a misunderstanding of John and
your original proposal. From reading [1], it seemed to me like a
required step of this proposal was to upload the objects you want to
filter out ahead of time, and then run `git repack -ad --filter=...`.

So my concerns thus far have been around the lack of cohesion between
(1) the filter which describes the set of objects uploaded to the HTTP
server, and (2) the filter used when re-filtering the repository.

If (1) and (2) aren't inverses of each other, then in the case where (2)
leaves behind an object which wasn't caught by (1), we have lost that
object.

If instead the server used in your script at [1] is a stand-in for an
ordinary Git remote, that changes my thinking significantly. See below
for more details:

On Tue, Feb 22, 2022 at 06:11:11PM +0100, Christian Couder wrote:
> > > > I share the same concern as Robert and Stolee do. But I think this issue
> > > > goes deeper than just naming.
> > > >
> > > > Even if we called this `git repack --delete-filter` and only ran it with
> > > > `--i-know-what-im-doing` flag, we would still be leaving repository
> > > > corruption on the table, just making it marginally more difficult to
> > > > achieve.
> > >
> > > My opinion on this is that the promisor object mechanism assumes by
> > > design that some objects are outside a repo, and that this repo
> > > shouldn't care much about these objects possibly being corrupted.
> >
> > For what it's worth, I am fine having a mode of repack which allows us
> > to remove objects that we know are stored by a promisor remote. But this
> > series doesn't do that, so users could easily run `git repack -d
> > --filter=...` and find that they have irrecoverably corrupted their
> > repository.
>
> In some cases we just know the objects we are removing are stored by a
> promisor remote or are replicated on different physical machines or
> both, so you should be fine with this.

I am definitely OK with a convenient way to re-filter your repository
locally so long as you know that the objects you are filtering out are
available via some promisor remote.

But perhaps I have misunderstood what this proposal is for. Reading
through John's original cover letter and the link to your demo script, I
understood that a key part of this was being able to upload the pack of
objects you were about to filter out of your local copy to some server
(not necessarily Git) over HTTP.

My hesitation so far has been based on that understanding. Reading these
patches, I don't see a mechanism to upload objects we're about to
expunge to a promisor remote.

But perhaps I'm misunderstanding: if you are instead assuming that the
existing set of remotes can serve any objects that we deleted, and this
is the way to delete them, then I am OK with that approach. But I think
either way, I am missing some details in the original proposal that
would have perhaps made it easier for me to understand what your goals
are.

In any case, this patch series doesn't seem to correctly set up a
promisor remote for me, since doing the following on a fresh clone of
git.git (after running "make"):

    $ bin-wrappers/git repack -ad --filter=blob:none
    $ bin-wrappers/git fsck

results in many "missing blob" and "missing link" lines out of output.

(FWIW, I think what's missing here is correctly setting up the affected
remote(s) as promisors and indicating that we're now in a filtered
setting when going from a full clone down to a partial one.)

> If you are not fine with this because sometimes a user might use it
> without knowing, then why are you ok with commands deleting refs not
> checking that there isn't a regular repack removing dangling objects?

I'm not sure I totally understand your question, but my general sense
has been "because we typically make it difficult / impossible to remove
reachable objects".

Thanks,
Taylor

[1]: https://gitlab.com/chriscool/partial-clone-demo/-/blob/master/http-promisor/server_demo.txt#L47-52