Re: Issue with git log and reference repositories using --dissociate and --filter=blob:none

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[+cc Jonathan Tan for his commit mentioned below but also as the general
promisor repacking guru]

On Fri, May 17, 2024 at 02:52:45PM +0100, Matt Cree wrote:

> From what I can tell, what is going on here is the following
> 1. We cloned the repository using --dissociate which forces a repack of the cloned repository
> 2. The clone completes fast (on git 2.40+) but in the background 'git-remote-https' is running
> 3. The bug appears while I request the log during this time
> 4. When the 'git-remote-https' process ends, the log can be requested successfully

Thanks for providing a thorough example, which let me reproduce the
problem. I think the stuff about remote-https is a red herring. The
fundamental issue is some weirdness in the interaction between the
promisor and alternates features when repacking (a "promisor" pack is
one we got from a filtered fetch, where the other side "promises" that
it can give us the rest of the objects later). Plus Git being a little
too hesitant to lazy-fetch the missing commit.

Here's an even simpler from-scratch reproduction:

-- >8 --
# start with a clean slate
rm -rf server client reference

# we have a server that allows partial clones
git init server
git -C server config uploadpack.allowFilter true

# we also have a reference repo which has its first commit
git -C server commit --allow-empty -m one
git clone --bare server reference

# but importantly the server also has a second commit which is not
# in the reference repo
git -C server commit --allow-empty -m two

# now we do a partial reference clone. This is going to get a new copy
# of commit "two" in a pack marked as a promisor (since we know the
# other side can give us anything it points to later if we ask). But it
# won't get a copy of "one", because that's in the reference repo
git clone --no-local --reference reference --filter=blob:none server client

# At this point we're good, and can access all objects. But if we had
# specified --dissociate, it would do the equivalent of these commands:
cd client
git repack -ad
rm -f .git/objects/info/alternates

# And now we find that when we run git-log, we are missing commit "one"!
git --no-pager log
-- >8 --

During that repack we actually make two packs:

  1. We repack all of the promisor objects we have into their own new
     pack (so just "two" in this case).

  2. And then we follow up by making a new pack with all of the
     non-promisor objects (both local and non-local). Which in this case
     would usually be "one". But we tell it to exclude any promisor
     objects, which causes our traversal to mark "two" with the
     UNINTERESTING flag. And then as we traverse, that flag transitively
     applies to things we can reach, including "one", and we exclude it
     from this pack.

     So we realize that "one" is something we _could_ ask the server
     for, even though it's not in a promisor pack itself. But it's sort
     of both a promisor (reachable from a promisor object) and sort of
     not (we don't have it in a promisor pack). And so it's included in
     neither of the new packs.

I think this is sub-optimal, in that we'd usually try to keep promisor
objects we have locally (without alternates, they'd have come from the
original filtered fetch and would be in the promisor pack). But we fail
to migrate them to the new pack in this case.

However, this isn't actually a corruption, because you really can get
"one" from the server. But git-log doesn't ask for it! It used to, but
stopped due to 7e2ad1cda2 (commit: don't lazy-fetch commits,
2022-12-14), where parse_commit(), etc, won't do the lazy fetch. You can
still grab it with something like "git cat-file commit HEAD^", which
lazy-fetches via read_object_file(). After which the repo is repaired
(though of course in a bigger repository there may be many such
commits).

The logic in 7e2ad1cda2 is that we don't have any filters which exclude
commits, so there's no need to lazy-fetch them. But obviously we can
still be missing them due to the promisor repacking scheme, as above. So
I'm not sure if that commit is wrong, or if the repacking should be more
careful about holding on to objects from the reference repo (which would
fix the bug, but also do the more efficient thing the user was trying in
the first place).

-Peff




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux