Thank you for everyone's feedback. Really appreciate the collaboration! On 23 Feb 2022, at 14:31, Junio C Hamano wrote: > Christian Couder <christian.couder@xxxxxxxxx> writes: > >>> For what it's worth, I am fine having a mode of repack which allows us >>> to remove objects that we know are stored by a promisor remote. But this >>> series doesn't do that, so users could easily run `git repack -d >>> --filter=...` and find that they have irrecoverably corrupted their >>> repository. >> >> In some cases we just know the objects we are removing are stored by a >> promisor remote or are replicated on different physical machines or >> both, so you should be fine with this. > > So, we need to decide if an object we have that is outside the > narrowed filter definition was (and still is, but let's keep the > assumption the whole lazy clone mechanism makes: promisor remotes > will never shed objects that they once served) available at the > promisor remote, but I suspect we have too little information to > reliably do so. It is OK to assume that objects in existing packs > taken from the promisor remotes and everything reachable from them > (but missing from our object store) will be available to us from > there. But if we see an object that is outside of _new_ filter spec > (e.g. you fetched with "max 100MB", now you are refiltering with > "max 50MB", narrowing the spec, and you need to decide for an object > that weigh 70MB), can we tell if that can be retrieved from the > promisor or is it unique to our repository until we push it out? I > am not sure. For that matter, do we even have a way to compare if > the new filter spec is a subset, a superset, or neither, of the > original filter spec? let me try to summarize (perhaps over simplify) the main concern folks have with this feature, so please correct me if I'm wrong! As a user, if I apply a filter that ends up deleting objects that it turns out do not exist anywhere else, then I have irrecoverably corrupted my repository. Before git allows me to delete objects from my repository, it should be pretty certain that I have path to recover those objects if I need to. Is that correct? It seems to me that, put another way, we don't want to give users too much rope to hang themselves. I can see why we would want to do this. In this case, there have been a couple of alternative ideas proposed throughout this thread that I think are viable and I wanted to get folks thoughts. 1. split pack file - (Rob gave this idea and Taylor provided some more detail on how using pack-objects would make it fairly straightforward to implement) when a user wants to apply a filter that removes objects from their repository, split the packfile into one containing objects that are filtered out, and another packfile with objects that remain. pros: simple to implement cons: does not address the question "how sure am I that the objects I want to filter out of my repository exist on a promsior remote?" 2. check the promisor remotes to see if they contain the objects that are about to get deleted. Only delete objects that we find on promisor remotes. pros: provides assurance that I have access to objects I am about to delete from a promsior remote. cons: more complex to implement. [*] Out of these two, I like 2 more for the aforementioned pros. * I am beginning to look into how fetches work and am still pretty new to the codebase so I don't know if this is even feasible, but I was thinking perhaps we could write a function that fetches with a --filter and create a promisor packfile containing promisor objects (this operaiton would have to somehow ignore the presence of the actual objects in the repository). Then, we would have a record of objects we have access to. Then, repack --filter can remove only the objects contained in this promisor packfile. > >> If you are not fine with this because sometimes a user might use it >> without knowing, then why are you ok with commands deleting refs not >> checking that there isn't a regular repack removing dangling objects? > > Sorry, I do not follow this argument. Your user may do "branch -D" > because the branch deleted is no longer needed, which may mean that > objects only reachable from the deleted branch are no longer needed. > I do not see what repack has anything to do with that. As long as > the filter spec does not change (in other words, before this series > is applied), the repack that discards objects that are known to be > reachable from objects in packs retrieved from promisor remote, the > objects that are no longer reachable may be removed and that will > not lose objects that we do not know to be retrievable from there > (which is different from objects that we know are unretrievable). > But with filter spec changing after the fact, I am not sure if that > is safe. IOW, "commands deleting refs" may have been OK without > this series, but this series may be what makes it not OK, no? > > Puzzled.