Re: [PATCH v12 3/7] receive-pack: add new proc-receive hook

Jiang Xin <worldhello.net@xxxxxxxxx> · Thu, 30 Apr 2020 23:33:27 +0800

Jeff King <peff@xxxxxxxx> 于2020年4月29日周三 下午3:56写道：
>
> On Tue, Apr 28, 2020 at 01:00:00AM +0800, Jiang Xin wrote:
>
> > For over a week, I have not received more comments on reroll v13 on
> > this topic ("jx/proc-receive-hook"). Therefore, I invite Peff and
> > Couder to review because I think it would be very interesting to add a
> > centralized workflow to Github and Gitlab.
> >
> >  * [PATCH v13 3/8]:
> > https://public-inbox.org/git/20200418160334.15631-4-worldhello.net@xxxxxxxxx/
> >
> > This patch introduces a new hook "proc-receive" on the server side. It
> > won't break anything except GitHub's "spokes" architecture (I guess).
> > Because in Alibaba, we have such issue when we implement our own
> > "spokes" architecture.  In order to make this special push operation
> > (on a special ref such as "refs/for/master/topic") idempotently on
> > multiple replicas, we extended the protocol of "receive-pack" and let
> > "spokes" to send a request with a flag, which specifies one replica as
> > master replica to execute the "proc-receive" hook.
>
> We don't do any magic ref-rewriting like this, so we wouldn't turn such
> a hook on. Presumably without such a hook, the protocol would remain the
> same (i.e., if there was no rewrite, then the report-status response
> wouldn't have the extra field).
>
> And I'm not sure if you're proposing to also use it as the trigger point
> to coordinate the ref update across replicas. That wouldn't be helpful
> for our architecture. All Git client connections to GitHub terminate at
> a proxy layer that handles load balancing, authentication, and routing.
> For pushes, that's the layer that multiplexes the packfile to each
> replica, and it drives the ref update through an out-of-band procedure.
> So there is no master replica at all; they're all talking equally to the
> proxy layer which coordinates things (and the same is true for non-push
> updates; they're coordinated by whichever entity is trying to perform
> the update).
>
> We do run receive-pack on each replica backend. We have a hacky patch
> for a config option that tells receive-pack to just skip the actual
> ref-transaction, leaving it up to the proxy layer to do. I've been
> pushing for us to actually abandon receive-pack entirely, since most of
> its heavy lifting can be done by sub-programs (for-each-ref for the
> advertisement, index-pack to receive the pack, and update-ref to update
> refs). But it's a non-trivial change, and the benefits are only
> moderate, so it hasn't quite been worth the effort yet.
>

Thanks to Peff for providing technical details of the architecture.  I
understand that "receive-pack" of GitHub backend is not involved in
references update (executing the commands), so the "proc-receive" hook
won't be turned on for GitHub's architecture. While in our
architecture (inspired by "spokes" of GitHub), the proxy will deliver
not only packfile, but also commands to all three replicas. The proxy
will execute "receive-pack" on the replica with a special argument, so
the proxy can talk with "receive-pack" with an extended protocol.
After running pre-receive hook and release the packfile from
quarantine, the replica will stop and wait for the proxy to
coordinate. After creating a distributed lock, the proxy will tell all
the replicas continue to update the references.  One problem we met is
the proc-receive and the post-receive hook must be executed once. We
can make the execution of the hooks idempotent, or let only one of the
replica run the hook. We choose the latter.

> So I don't really have much of an opinion on it either way.
>
> Reading the commit message for patch 3, I do have one small comment. I
> see this:
>
>   The reporting function from "receive-pack" to "send-pack" is extended
>   using a backward compatible way by adding key-value pairs after an null
>   character, like:
>
>       # OK, run this command successfully with optional extended-status.
>       ok <reference>\0ref=refs/pull/123/head old-oid=...
>
>       # NO, I reject it.
>       ng <reference> <error message>
>
> but we should probably avoid trickery like stuffing extra data after a
> NUL byte. We could quite easily extend the protocol with a new
> capability here. The server advertises "report-status-v2" or something,
> and the client responds with the appropriate capability to indicate that
> it understands. And then the new format can be something more
> extensible, like:
>
>   pktline(ref=refs/heads/master)
>   pktline(status=ok)
>   pktline(old-oid=...)
>   pktline(rewrite=refs/pull/123/head)
>   delim-pkt(0001)
>   pktline(ref=refs/heads/other)
>   pktline(status=error)
>   pktline(message=<error message>
>   flush-pkt(0000)

The above status report is from "receive-pack" to "send-pack", and
another place that we need to extend is in `print_helper_status()` of
`builtin/send-pack.c`, which will send report status in the following
format from "send-pack" to remote-helper (such as remote-curl.c):

    ok <ref>
    error <ref> <error message>

BTW, I don't know why not use "ng", but use "error" here.

I agree that adding new capability (report-status-v2) is a better
solution, but I think the above extension is a bit wordy. We add
additional 18 characters for each "ok <ref>" status, and add
additional 32 characters for each "ng <ref> <msg>" status. Can we
extend it like this:

    ok <ref>
    [optional key-value pairs]
    ng <ref> <error message>
    [optional key-value pairs]

E.g.:

    ok refs/heads/master     # no extensions
    ok refs/for/master/topic  # send 2 commits, and create 2 changes
    rewrite=refs/changes/123/1
    new-oid=...
    # use rewrite key-value pair as delimiter,
    # because plain text is in use from "send-pack" to remote-helper.
    rewrite=refs/changes/124/1
    new-oid=...
    ok refs/heads/next
    ng refs/heads/release non-fast forward

--
Jiang Xin