Re: [PATCH v2 0/4] [RFC] repack: add --filter=

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Taylor,

On 26 Feb 2022, at 15:30, Taylor Blau wrote:

> On Sat, Feb 26, 2022 at 03:19:11PM -0500, John Cai wrote:
>> Thanks for bringing this up again. I meant to write back regarding what you raised
>> in the other part of this thread. I think this is a valid concern. To attain the
>> goal of offloading certain blobs onto another server(B) and saving space on a git
>> server(A), then there will essentially be two steps. One to upload objects to (B),
>> and one to remove objects from (A). As you said, these two need to be the inverse of each
>> other or else you might end up with missing objects.
>
> Do you mean that you want to offload objects both from a local clone of
> some repository, _and_ the original remote it was cloned from?

yes, exactly. The "another server" would be something like an http server, OR another remote
which hosts a subset of the objects (let's say the large blobs).
>
> I don't understand what the role of "another server" is here. If this
> proposal was about making it easy to remove objects from a local copy of
> a repository based on a filter provided that there was a Git server
> elsewhere that could act as a promisor remote, than that makes sense to
> me.
>
> But I think I'm not quite understanding the rest of what you're
> suggesting.

Sorry for the lack of clarity here. The goal is to make it easy for a remote to offload a subset
of its objects to __another__ remote (either a Git server or an http server through a remote helper).
>
>>> My other concern was around what guarantees we currently provide for a
>>> promisor remote. My understanding is that we expect an object which was
>>> received from the promisor remote to always be fetch-able later on. If
>>> that's the case, then I don't mind the idea of refiltering a repository,
>>> provided that you only need to specify a filter once.
>>
>> Could you clarify what you mean by re-filtering a repository? By that I assumed
>> it meant specifying a filter eg: 100mb, and then narrowing it by specifying a
>> 50mb filter.
>
> I meant: applying a filter to a local clone (either where there wasn't a
> filter before, or a filter which matched more objects) and then removing
> objects that don't match the filter.
>
> But your response makes me think of another potential issue. What
> happens if I do the following:
>
>     $ git repack -ad --filter=blob:limit=100k
>     $ git repack -ad --filter=blob:limit=200k
>
> What should the second invocation do? I would expect that it needs to do
> a fetch from the promisor remote to recover any blobs between (100, 200]
> KB in size, since they would be gone after the first repack.
>
> This is a problem not just with two consecutive `git repack --filter`s,
> I think, since you could cook up the same situation with:
>
>     $ git clone --filter=blob:limit=100k git@xxxxxxxxxx:git
>     $ git -C git repack -ad --filter=blob:limit=200k
>
> I don't think the existing patches handle this situation, so I'm curious
> whether it's something you have considered or not before.

I have not-will have to think through this case, but this sound similar to
what [1] is about.
is about.

>
> (Unrelated to the above, but please feel free to trim any quoted parts
> of emails when responding if they get overly long.)
>
> Thanks,
> Taylor

Thanks
John

1. https://lore.kernel.org/git/pull.1138.v2.git.1645719218.gitgitgadget@xxxxxxxxx/




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux