[no subject]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




-----Original Message-----
From: Jeff King <peff@xxxxxxxx> 
Sent: Thursday, December 26, 2024 7:53 AM
To: Mirochnik, Oleg V <oleg.v.mirochnik@xxxxxxxxx>
Cc: git@xxxxxxxxxxxxxxx
Subject: Re: "git fetch" fails for a --reference clone after an outer forced push

On Wed, Dec 25, 2024 at 05:23:13PM +0000, Mirochnik, Oleg V wrote:

> What did you do before the bug happened? (Steps to reproduce your 
> issue)
> 
> $ cat ./doit
> #!/bin/sh
> set -xe
> rm -rf tst
> mkdir tst
> cd tst
> mkdir master
> git -C master init --bare
> git clone master local
> touch local/foo
> git -C local add .
> git -C local commit -m init-commit
> git -C local push
> echo foo > local/foo
> git -C local commit -a -m dummy-commit git -C local push origin 
> HEAD:refs/heads/dummy git clone --mirror file://`pwd`/master mirror 
> git clone --reference `pwd`/mirror file://`pwd`/master local1 git -C 
> local1 log --oneline origin/dummy git -C local commit --amend -m 
> new-dummy-commit git -C local push -f origin HEAD:dummy git -C mirror 
> fetch git -C mirror gc --prune=now git -C local1 fetch git -C local1 
> log --oneline origin/dummy
>
> [...]
>
> What happened instead? (Actual behavior)
> 
> + git -C local1 fetch
> fatal: bad object refs/remotes/origin/dummy
> error: file:///tmp/tst/master did not send all necessary objects

This is the expected behavior, and what the warning in "git help clone"
is talking about:

  NOTE: this is a possibly dangerous operation; do not use it unless you
  understand what it does. If you clone your repository using this
  option and then delete branches (or use any other Git command that
  makes any existing commit unreferenced) in the source repository, some
  objects may become unreferenced (or dangling). These objects may be
  removed by normal Git operations (such as git commit) which
  automatically call git maintenance run --auto. (See
  git-maintenance(1).) If these objects are removed and were referenced
  by the cloned repository, then the cloned repository will become
  corrupt.

Your "mirror" repository has no idea that other repositories are depending on it. To safely do a "git gc" there, it would need to know all of the objects that are referenced by the dependent repositories, to count them as reachable.

One way to do that is something like:

  1. Enable the "preciousObjects" flag in the mirror repo, to prevent
     accidental destruction (e.g., from auto-gc):

       git -C mirror config core.repositoryFormatVersion 1
       git -C mirror config extensions.preciousObjects true

  2. When you do want to run gc on the mirror repo, collect all of the
     references from child repos first:

       # collect references from all child repos; the destination
       # doesn't really matter here, and you could even delete
       # refs/child/* after the gc if you want
       for $repo in local*; do
         git -C mirror fetch --prune ../$repo refs/*:refs/child/$repo/*
       done

       # now gc, disabling preciousObjects temporarily
       git -c extensions.preciousObjects=false gc --prune=now

This is (roughly) what a site like GitHub is doing on the backend with repository forks. But Git doesn't ship any scripts to help with it, and I don't offhand know of any public ones. I assume GitLab does something similar, and their system may be open source.

Some gotchas:

  - this is obviously racy with simultaneous updates to the local repos

  - you'd probably want to fetch HEAD as well, to cover detached HEADs

  - it won't cover blobs/trees referenced by the index of each child
    repo (but those are probably going to be local to those repos
    anyway).

  - it won't cover reflogs in the local repos either (but it's not the
    end of the world if a reflog entry goes stale)

Another, perhaps simpler approach, is to just never expire objects from the mirror repo (with the obvious downside being that you might carry objects forever that nobody cares about). You can set gc.pruneExpire to something high, and then look into gc.cruftPacks to store the old objects in a more efficient form.

Hope that helps.

-Peff




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux