-----Original Message----- From: Jeff King <peff@xxxxxxxx> Sent: Thursday, December 26, 2024 7:53 AM To: Mirochnik, Oleg V <oleg.v.mirochnik@xxxxxxxxx> Cc: git@xxxxxxxxxxxxxxx Subject: Re: "git fetch" fails for a --reference clone after an outer forced push On Wed, Dec 25, 2024 at 05:23:13PM +0000, Mirochnik, Oleg V wrote: > What did you do before the bug happened? (Steps to reproduce your > issue) > > $ cat ./doit > #!/bin/sh > set -xe > rm -rf tst > mkdir tst > cd tst > mkdir master > git -C master init --bare > git clone master local > touch local/foo > git -C local add . > git -C local commit -m init-commit > git -C local push > echo foo > local/foo > git -C local commit -a -m dummy-commit git -C local push origin > HEAD:refs/heads/dummy git clone --mirror file://`pwd`/master mirror > git clone --reference `pwd`/mirror file://`pwd`/master local1 git -C > local1 log --oneline origin/dummy git -C local commit --amend -m > new-dummy-commit git -C local push -f origin HEAD:dummy git -C mirror > fetch git -C mirror gc --prune=now git -C local1 fetch git -C local1 > log --oneline origin/dummy > > [...] > > What happened instead? (Actual behavior) > > + git -C local1 fetch > fatal: bad object refs/remotes/origin/dummy > error: file:///tmp/tst/master did not send all necessary objects This is the expected behavior, and what the warning in "git help clone" is talking about: NOTE: this is a possibly dangerous operation; do not use it unless you understand what it does. If you clone your repository using this option and then delete branches (or use any other Git command that makes any existing commit unreferenced) in the source repository, some objects may become unreferenced (or dangling). These objects may be removed by normal Git operations (such as git commit) which automatically call git maintenance run --auto. (See git-maintenance(1).) If these objects are removed and were referenced by the cloned repository, then the cloned repository will become corrupt. Your "mirror" repository has no idea that other repositories are depending on it. To safely do a "git gc" there, it would need to know all of the objects that are referenced by the dependent repositories, to count them as reachable. One way to do that is something like: 1. Enable the "preciousObjects" flag in the mirror repo, to prevent accidental destruction (e.g., from auto-gc): git -C mirror config core.repositoryFormatVersion 1 git -C mirror config extensions.preciousObjects true 2. When you do want to run gc on the mirror repo, collect all of the references from child repos first: # collect references from all child repos; the destination # doesn't really matter here, and you could even delete # refs/child/* after the gc if you want for $repo in local*; do git -C mirror fetch --prune ../$repo refs/*:refs/child/$repo/* done # now gc, disabling preciousObjects temporarily git -c extensions.preciousObjects=false gc --prune=now This is (roughly) what a site like GitHub is doing on the backend with repository forks. But Git doesn't ship any scripts to help with it, and I don't offhand know of any public ones. I assume GitLab does something similar, and their system may be open source. Some gotchas: - this is obviously racy with simultaneous updates to the local repos - you'd probably want to fetch HEAD as well, to cover detached HEADs - it won't cover blobs/trees referenced by the index of each child repo (but those are probably going to be local to those repos anyway). - it won't cover reflogs in the local repos either (but it's not the end of the world if a reflog entry goes stale) Another, perhaps simpler approach, is to just never expire objects from the mirror repo (with the obvious downside being that you might carry objects forever that nobody cares about). You can set gc.pruneExpire to something high, and then look into gc.cruftPacks to store the old objects in a more efficient form. Hope that helps. -Peff