Re: [PATCH v2 0/4] [RFC] repack: add --filter=

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Feb 21, 2022 at 10:42 PM Taylor Blau <me@xxxxxxxxxxxx> wrote:
>
> On Mon, Feb 21, 2022 at 10:10:15PM +0100, Christian Couder wrote:
> > > > Also, to have more protection we can either
> > > >
> > > > 1. add a config value that needs to be set to true for repack to remove
> > > > objects (repack.allowDestroyFilter).
> >
> > I don't think it's of much value. We don't have such config values for
> > other possibly destructive operations.
> >
> > > > 2. --filter is dry-run by default and prints out objects that would have been removed,
> > > > and it has to be combined with another flag --destroy in order for it to actually remove
> > > > objects from the odb.
> >
> > I am not sure it's of much value either compared to naming it
> > --filter-destroy. It's likely to just make things more difficult for
> > users to understand.
>
> On this and the above, I agree with Christian.
>
> > > I share the same concern as Robert and Stolee do. But I think this issue
> > > goes deeper than just naming.
> > >
> > > Even if we called this `git repack --delete-filter` and only ran it with
> > > `--i-know-what-im-doing` flag, we would still be leaving repository
> > > corruption on the table, just making it marginally more difficult to
> > > achieve.
> >
> > My opinion on this is that the promisor object mechanism assumes by
> > design that some objects are outside a repo, and that this repo
> > shouldn't care much about these objects possibly being corrupted.
>
> For what it's worth, I am fine having a mode of repack which allows us
> to remove objects that we know are stored by a promisor remote. But this
> series doesn't do that, so users could easily run `git repack -d
> --filter=...` and find that they have irrecoverably corrupted their
> repository.

In some cases we just know the objects we are removing are stored by a
promisor remote or are replicated on different physical machines or
both, so you should be fine with this.

If you are not fine with this because sometimes a user might use it
without knowing, then why are you ok with commands deleting refs not
checking that there isn't a regular repack removing dangling objects?

Also note that people who want to remove objects using a filter can
already do it by cloning with a filter and then replacing the original
packs with the packs from the clone. So refusing this new feature is
just making things more cumbersome.

> I think that there are some other reasonable directions, though. One
> which Robert and I discussed was making it possible to split a
> repository into two packs, one which holds objects that match some
> `--filter` criteria, and one which holds the objects that don't match
> that filter.

I am ok with someone implementing this feature, but if an option that
actually deletes the filtered objects is rejected then such a feature
will be used with some people just deleting one of the resulting packs
(and they might get it wrong), so I don't think any real safety will
be gained.

> Another option would be to prune the repository according to objects
> that are already made available by a promisor remote.

If the objects have just been properly transferred to the promisor
remote, the check will just waste resources.

> An appealing quality about the above two directions is that the first
> doesn't actually remove any objects, just makes it easier to push a
> whole pack of unwanted objects off to a promsior remote. The second
> prunes the repository according to objects that are already made
> available by the promisor remote. (Yes, there is a TOCTOU race there,
> too, but it's the same prune-while-pushing race that Git already has
> today).
>
> > I am not against a name and some docs that strongly state that users
> > should be very careful when using such a command, but otherwise I
> > think such a command is perfectly ok. We have other commands that by
> > design could lead to some objects or data being lost.
>
> I can think of a handful of ways to remove objects which are unreachable
> from a repository, but I am not sure we have any ways to remove objects
> which are reachable.

Cloning with a filter already does that. It's by design in the
promisor object and partial clone mechanisms that reachable objects
are removed. Having more than one promisor remote, which is an
existing mechanism, means that it's just wasteful to require all the
remotes to have all the reachable objects, so how could people easily
set up such remotes? Why make it unnecessarily hard and forbid a
straightforward way?



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux