Re: Removing Partial Clone / Filtered Clone on a repo

Derrick Stolee <stolee@xxxxxxxxx> · Tue, 1 Jun 2021 06:39:28 -0400

On 6/1/21 6:24 AM, Tao Klerks wrote:
> Hi folks,
> 
> I'm trying to deepen my understanding of the Partial Clone
> functionality for a possible deployment at scale (with a large-ish
> 13GB project where we are using date-based shallow clones for the time
> being), and one thing that I can't get my head around yet is how you
> "unfilter" an existing filtered clone.
> 
> The gitlab intro document
> (https://docs.gitlab.com/ee/topics/git/partial_clone.html#remove-partial-clone-filtering)
> suggests that you need to get the full list of missing blobs, and pass
> that into a fetch...:
> 
> git fetch origin $(git rev-list --objects --all --missing=print | grep
> -oP '^\?\K\w+')

I think the short answer is to split your "git rev-list" call
into batches by limiting the count. Perhaps pipe that command
to a file and then split it into batches of "reasonable" size.

Your definition of "reasonable" may vary, so try a few numbers.

> The official doc at https://git-scm.com/docs/partial-clone makes no
> mention of plans or goals (or non-goals) related to this "unfiltering"
> - is it something that we should expect a story to emerge around?

The design is not intended for this kind of "unfiltering". The
feature is built for repositories where doing so would be too
expensive (both network time and disk space) to be valuable.

Also, asking for the objects one-by-one like this is very
inefficient on the server side. A fresh clone can make use of
existing delta compression in a way that this type of request
cannot (at least, not easily). You _would_ be better off making
a fresh clone and then adding that pack-file to your
.git/objects/pack directory of the repository you want.

Could you describe more about your scenario and why you want to
get all objects?

Thanks,
-Stolee