> Date: Mon, 4 Nov 2024 18:47:05 -0500 > From: Jeff King <peff@xxxxxxxx> > > On Sat, Nov 02, 2024 at 02:06:53AM +0000, Taylor R Campbell wrote: > > > Whenever I push anything to it, I want the push -- that is, all the > > objects, and all the ref updates -- to be synchronously replicated to > > another remote repository, the back end: > > This isn't quite how replication works at, say, GitHub. But let me first > explain some of what you're seeing, and then I'll give some higher level > comments at the end. Great, thanks! I understand Github works differently, and I'm not trying to replicate everything about Github's architecture, which I expect to take substantial novel software engineering effort. But I am trying to make sure I understand how the parts fit together well enough provide qualitatively similar types of guarantees about durability when the user's `git push' exits nonzero. I really have two different goals here, which have similar needs for relaying pushes but which I'm sure will diverge at some point: 1. provide a synchronous push/pull git frontend to an hg backend with git-cinnabar (so to ordinary git clients it looks just like an ordinary git remote, without needing git-cinnabar), and 2. provide a git frontend that replicates to one or many git backends for better resilience to server loss. > Instead, you should disable push's attempt to > update the local tracking refs. There isn't an option to do that, but > if you don't have a "fetch" config line, then there are no tracking > refs. I.e., rather than using "clone --mirror", create your frontend > repo like this: > > git init --bare > git config remote.backend.url git@xxxxxxxxxxxxxxxxxxx:/repo.git > git fetch backend refs/*:refs/* > > And then push won't try to update anything in the frontend repo. Thanks, that hadn't occurred to me as an option. > Side note: there's a small maybe-bug here that I noticed if the > backend is on the same local filesystem. In that case > GIT_QUARANTINE_PATH remains set for the receive-pack process running > on the backend repo, and will refuse to update refs (where it should > be safe to do so!). In your example that doesn't happen because > GIT_QUARANTINE_PATH does not make it across the ssh connection. But > arguably we should be clearing GIT_QUARANTINE_PATH in local_repo_env > like we do for GIT_DIR, etc. I don't think you ran into this, but just > another hiccup I found while trying to reproduce your situation. (I did actually run into this, so in my test scripts I have been using git {clone,config,...} ext::"env -i PATH=$PATH git %s /path/to/backend.git" ... instead of just git {clone,config,...} /path/to/backend.git ... in order to nix GIT_QUARANTINE_PATH from the environment -- and anything else I might not have thought of -- while running git-receive-pack on the backend. But it didn't seem germane to the problem at hand so I didn't want to clutter up my already somewhat long question with such details unless someone asked me to share my reproducer!) > > 3. Same as (1), but the pre-receive hook assembles a command line of > > > > exec git push backend ${new0}:${ref0} ${new1}:${ref1} ..., > > > > with all the ref updates passed on stdin (ignoring the old values). > > ...yes, this is the correct approach. You're not _quite_ passing all of > the relevant info, though, because you're ignoring the old value of each > ref. And ideally you'd make sure you were moving backend's ref0 from > "old0" to "new0"; otherwise you risk overwriting something that happened > independently on the backend. Of course that creates new questions, > like what happens when the frontend and backend get out of sync. Right -- there will be some combination of --force-with-lease or pre-receive tests at the other end to handle this. But for now my focus is on making git push work in pre-receive at all. As long as anything out-of-sync leads to noisy failure, possibly requiring manual intervention, that's good enough for now (and I'm not (yet) concerned with . > > remote: error: update_ref failed for ref 'refs/heads/main': ref updates forbidden inside quarantine environment > > > > but somehow the push succeeds in spite of this message, and the > > primary and replica both get updated. > > This is again the quarantine issue updating local tracking branches. > However, we don't consider that a hard error, as updating them is > opportunistic (we'd get the new values on the next fetch anyway). > > If you drop the refspec as above, you shouldn't see that any more. Yes, thanks! > Now back to the main point: is this a good way to do replication? I > don't think it's _terrible_, but there are two flaws I can see: These are all good points that I will consider once I get to them now that I can make progress past the obstacle of local tracking ref updates in pre-receive git push, thanks. > 1. You're not kicking off the backend push until the frontend has > received and processed the whole pack. So you're doubling the > end-to-end latency of the push. In an ideal world you'd actually > stream the incoming packfile to the backend, which would doing its > own quarantined index-pack[*] on it in real-time. And then when you > get to the pre-receive hook, all that's left is for all of the > replicas to agree to commit to the ref update. Git doesn't currently have any hooks for doing this, right? So presumably this will require a custom git-receive-pack replacement that understands the git wire protocol to stream the packfile to backends (which is what I assume Github's spokes proxies do). > 2. Using "push" isn't a very atomic way of updating refs. The backends > will either accept the push or not, and then the frontend will try > to update its refs. What if it fails? What if another push comes in > simultaneously? Can they overwrite each other or lose pushed data? > Or get the frontend and backends out of sync? Right -- there's a lot to work out for the three-phase commit part. One simplification for now is to reject non-fast-forward pushes (and ref deletion), and to not worry too much about ordering of independent ref updates or whether I even want serializable isolation or just read-repeatable or -committed for that. That said, regarding push atomicity: Suppose users concurrently do alice$ git push frontend X Y bob$ git push frontend Y X That is, there are overlapping ref updates, and suppose Alice and Bob have incompatible referents for X and Y (non-fast-forward, or they're using --force-with-lease but not --atomic, or whatever). When are the locks on X and Y taken relative to pre-receive in the frontend? Can the pre-receive hooks for Alice's push and Bob's push run concurrently or are they serialized by locks on the common refs X and Y? This can't deadlock, can it? (I assume the locks on refs are taken in a consistent order.) It's unclear to me from the githooks(5), git-push(1), and git-receive-pack(1) man pages what the ordering of hooks and ref locking is, or what serialization guarantees hooks have -- if any.