On Sat, Feb 26, 2022 at 03:19:11PM -0500, John Cai wrote: > Thanks for bringing this up again. I meant to write back regarding what you raised > in the other part of this thread. I think this is a valid concern. To attain the > goal of offloading certain blobs onto another server(B) and saving space on a git > server(A), then there will essentially be two steps. One to upload objects to (B), > and one to remove objects from (A). As you said, these two need to be the inverse of each > other or else you might end up with missing objects. Do you mean that you want to offload objects both from a local clone of some repository, _and_ the original remote it was cloned from? I don't understand what the role of "another server" is here. If this proposal was about making it easy to remove objects from a local copy of a repository based on a filter provided that there was a Git server elsewhere that could act as a promisor remote, than that makes sense to me. But I think I'm not quite understanding the rest of what you're suggesting. > > My other concern was around what guarantees we currently provide for a > > promisor remote. My understanding is that we expect an object which was > > received from the promisor remote to always be fetch-able later on. If > > that's the case, then I don't mind the idea of refiltering a repository, > > provided that you only need to specify a filter once. > > Could you clarify what you mean by re-filtering a repository? By that I assumed > it meant specifying a filter eg: 100mb, and then narrowing it by specifying a > 50mb filter. I meant: applying a filter to a local clone (either where there wasn't a filter before, or a filter which matched more objects) and then removing objects that don't match the filter. But your response makes me think of another potential issue. What happens if I do the following: $ git repack -ad --filter=blob:limit=100k $ git repack -ad --filter=blob:limit=200k What should the second invocation do? I would expect that it needs to do a fetch from the promisor remote to recover any blobs between (100, 200] KB in size, since they would be gone after the first repack. This is a problem not just with two consecutive `git repack --filter`s, I think, since you could cook up the same situation with: $ git clone --filter=blob:limit=100k git@xxxxxxxxxx:git $ git -C git repack -ad --filter=blob:limit=200k I don't think the existing patches handle this situation, so I'm curious whether it's something you have considered or not before. (Unrelated to the above, but please feel free to trim any quoted parts of emails when responding if they get overly long.) Thanks, Taylor