Re: [PATCH v2 0/4] [RFC] repack: add --filter=

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thank you for everyone's feedback. Really appreciate the collaboration!

On 23 Feb 2022, at 14:31, Junio C Hamano wrote:

> Christian Couder <christian.couder@xxxxxxxxx> writes:
>
>>> For what it's worth, I am fine having a mode of repack which allows us
>>> to remove objects that we know are stored by a promisor remote. But this
>>> series doesn't do that, so users could easily run `git repack -d
>>> --filter=...` and find that they have irrecoverably corrupted their
>>> repository.
>>
>> In some cases we just know the objects we are removing are stored by a
>> promisor remote or are replicated on different physical machines or
>> both, so you should be fine with this.
>
> So, we need to decide if an object we have that is outside the
> narrowed filter definition was (and still is, but let's keep the
> assumption the whole lazy clone mechanism makes: promisor remotes
> will never shed objects that they once served) available at the
> promisor remote, but I suspect we have too little information to
> reliably do so.  It is OK to assume that objects in existing packs
> taken from the promisor remotes and everything reachable from them
> (but missing from our object store) will be available to us from
> there.  But if we see an object that is outside of _new_ filter spec
> (e.g. you fetched with "max 100MB", now you are refiltering with
> "max 50MB", narrowing the spec, and you need to decide for an object
> that weigh 70MB), can we tell if that can be retrieved from the
> promisor or is it unique to our repository until we push it out?  I
> am not sure.  For that matter, do we even have a way to compare if
> the new filter spec is a subset, a superset, or neither, of the
> original filter spec?

let me try to summarize (perhaps over simplify) the main concern folks have
with this feature, so please correct me if I'm wrong!

As a user, if I apply a filter that ends up deleting objects that it turns
out do not exist anywhere else, then I have irrecoverably corrupted my
repository.

Before git allows me to delete objects from my repository, it should be pretty
certain that I have path to recover those objects if I need to.

Is that correct? It seems to me that, put another way, we don't want to give
users too much rope to hang themselves.

I can see why we would want to do this. In this case, there have been a couple
of alternative ideas proposed throughout this thread that I think are viable and
I wanted to get folks thoughts.

1. split pack file - (Rob gave this idea and Taylor provided some more detail on
   how using pack-objects would make it fairly straightforward to implement)

when a user wants to apply a filter that removes objects from their repository,
split the packfile into one containing objects that are filtered out, and
another packfile with objects that remain.

pros: simple to implement
cons: does not address the question "how sure am I that the objects I want to
filter out of my repository exist on a promsior remote?"

2. check the promisor remotes to see if they contain the objects that are about
   to get deleted. Only delete objects that we find on promisor remotes.

pros: provides assurance that I have access to objects I am about to delete from
a promsior remote.
cons: more complex to implement. [*]

Out of these two, I like 2 more for the aforementioned pros.

* I am beginning to look into how fetches work and am still pretty new to the
codebase so I don't know if this is even feasible, but I was thinking perhaps
we could write a function that fetches with a --filter and create a promisor
packfile containing promisor objects (this operaiton would have to somehow
ignore the presence of the actual objects in the repository). Then, we would
have a record of objects we have access to. Then, repack --filter can remove
only the objects contained in this promisor packfile.

>
>> If you are not fine with this because sometimes a user might use it
>> without knowing, then why are you ok with commands deleting refs not
>> checking that there isn't a regular repack removing dangling objects?
>
> Sorry, I do not follow this argument.  Your user may do "branch -D"
> because the branch deleted is no longer needed, which may mean that
> objects only reachable from the deleted branch are no longer needed.
> I do not see what repack has anything to do with that.  As long as
> the filter spec does not change (in other words, before this series
> is applied), the repack that discards objects that are known to be
> reachable from objects in packs retrieved from promisor remote, the
> objects that are no longer reachable may be removed and that will
> not lose objects that we do not know to be retrievable from there
> (which is different from objects that we know are unretrievable).
> But with filter spec changing after the fact, I am not sure if that
> is safe.  IOW, "commands deleting refs" may have been OK without
> this series, but this series may be what makes it not OK, no?
>
> Puzzled.



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux