Re: Partial Clone, and a strange slow rev-list call on fetch

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 6/2/21 12:56 AM, Tao Klerks wrote:
> Hi folks,
> 
> I'm learning to use Partial Clone, and finding a behavior that I don't
> know how to interpret or investigate:
> 
> Under some circumstances, doing a plain "git fetch <remote>" on a
> filtered repo results in a very long (6-30 min?) wait, during which I
> can see the following command being executed in the background:
> 
> /usr/libexec/git-core/git rev-list --objects --stdin
> --exclude-promisor-objects --not --all --quiet --alternate-refs
> 
> So far, I have noted this happening under two distinct circumstances:
> * Anytime I try to fetch on a filtered repo with a git 2.23 client -
> shorter pause
> * When I try to fetch with a recent (2.31) client in a repo where one
> large packfile has no *.promisor file (but the others do, and the
> remote I am fetching from has promisor=true) - looong pause

This makes me think that there was a bug fix for this situation
but the fix requires doing extra work. To help track this down,
could you re-run the scenario with GIT_TRACE2_PERF=1 which will
give the full Git process stack as we reach that rev-list call.

> Can anyone explain what this rev-list call intends, and/or any hints
> as to how I could see what the stdin content being fed to it from the
> parent process actually is?
> 
> For background, I ended up in the "missing promisor file" situation by
> trying to be (too?) clever about the blobs present in my clone: I
> cloned unfiltered shallow to a certain depth with certain refspecs,
> then added the promisor and filter config, and finally fetched with
> "--unshallow". This produced exactly the blob-population state I
> intended, but meant the original first packfile had no ".promisor"
> file.

This is the critical point: you first cloned without a filter,
and then converted the remote to a promisor remote without
marking the pack-files you received from that remote as promisor
pack-files. That means that Git needs to do some work to discover
which objects are reachable from promisor packs or not, and that
extra work is slowing you down.

Partial clone is designed to work where every remote is a
promisor remote, and always has been so. Any deviation from that
norm is venturing into uncharted territory and will have friction
like this. Another similar issue comes when you have multiple
remotes and one of them is a promisor remote and another is not.

The general advice right now is to use partial clone only if you
will use it for all remotes across the entire existence of the
repo.

Part of the difficulty here is that once you download that first
pack-file from the remote, Git has no way of knowing that the
pack came from that source or was created in another way. We
have no way to be sure that we can "upgrade" the remote in an
automated process.

This does make me wonder what happens when Git repacks objects
created locally and then starts fetching from a promisor remote.

There are some challenges here, for sure. Most likely also some
potential gains, but it is unlikely to create a seamless
experience for what you are trying to do.

Thanks,
-Stolee



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux