Re: [RFC/PATCH 0/3] protocol v2

Junio C Hamano <gitster@xxxxxxxxx> · Sun, 01 Mar 2015 15:06:59 -0800

Duy Nguyen <pclouds@xxxxxxxxx> writes:

> On Sun, Mar 1, 2015 at 3:41 PM, Junio C Hamano <gitster@xxxxxxxxx> wrote:
>>>  - Because the protocol exchange starts by the server side
>>>    advertising all its refs, even when the fetcher is interested in
>>>    a single ref, the initial overhead is nontrivial, especially when
>>>    you are doing a small incremental update.  The worst case is an
>>>    auto-builder that polls every five minutes, even when there is no
>>>    new commits to be fetched [*3*].
>
> Maybe you can elaborate about how to handle states X, Y... in your
> footnote 3. I just don't see how it's actually implemented.  Or is
> it optional feature that will be provided (via hooks, maybe) by
> admin?

These footnotes are not the important part (I wanted us to agree on
the problem, and the ideas outlined with the footnotes are only an
example that illustrates how a potential solution and the problem to
be solved are described in relation to each other), but I'll give a
shot anyway ;-)

I am actually torn on how the names X, Y, etc. should be defined.

One side of me wants to leave its computation entirely up to the
server side.  The client says "Last time I talked with you asking
for refs/heads/* and successfully updated from you, you told me to
call that state X" without knowing how X is computed, and then the
server will update you and then tell you your state is now Y.  That
way, large hosting sites and server implementations can choose to
implement it any way they like.

On the other hand, we could rigidly define it, perhaps like this:

 - Imagine that you saved the output from ls-remote that is run
   against that server, limited to the refs hierarchy you are
   requesting, the last time you talked with it.

 - Concatenate the above to the list of patterns the client used to
   ask the refs.  This step is optional.

 - E.g. if you are asking it for "refs/heads/*", then we are talking
   something like this (illustrated with optional pattern in front):

        refs/heads/*
        8004647...      refs/heads/maint
        7f4ba4b...      refs/heads/master

 - Run SHA-1 hash over that.  And that is the state name.

I.e. if you as a client are doing

    [remote "origin"]
        fetch = refs/heads/*:refs/remotes/origin/*

and if the only time your refs/remotes/origin/* hierarchy changes is
when you fetch from there (which should be the norm), you can look
into remote.origin.fetch refspec (to learn that "refs/heads*" is
what you are asking) and your refs/remotes/origin/* refs (and
reverse the mapping you make when you fetch to make them talk about
refs/heads/* hierarchy on the server side), you can compute it
locally.

The latter will have one benefit over "opaque thing the client does
not know how to compute".  Because I want us avoid sending unchanged
refs over connection, but I do want to see the protocol has some
validation mechanism built in, even if we go the latter "client can
compute what the state name ought to be" route, I want the servrer
to tell the client what to call that state.  That way, the client
side can tell when it goes out of sync for any reason and attempt to
recover.

> Do we need to worry about load balancers?

Unless you are allowing multiple backend servers to serve a same
repository behind a set of load balancers in an inconsistent way
(e.g. you push to one while I push to two and you fetch from one and
you temporarily see my push but then my push will be rejected as
conflicting and you fetch from one and now you see your push), I do
not think there is anything you need to worry about them more than
what you should be worrying about already.

There would be a point where all backend servers would agree "This
is the set of values of these refs" at some point (e.g. a majority
of surviving servers vote to decide, laggers that later join the
party will update to the concensus value before serving the end-user
traffic), and they would not be showing half-updated values that
haven't ratified by other servers to end users (otherwise they may
end up showing reversion).

>>    - Is band #2 meant for human consumption, or do we expect the
>>      other end to interpret and act on it?  If the former, would it
>>      make sense to send locale information from the client side and
>>      ask the server side to produce its output with _("message")?
>
> No producing _(...) is a bad idea. First the client has to verify
> placeholders and stuff, we can't just feed data from server straight
> to printf(). Producing _() could complicate server code a lot. And I
> don't like the idea of using client .po files to translate server
> strings. There could be custom strings added by admin, which are not
> available in client .po. String translation should happen at server
> side.

What I meant to say was (1) the client says "I want the human
readable message in Vietnamese" and (2) the server uses .po on
_("message") in its code and send the result to sideband #2.  There
is no parsing, interpolation, or anything of that sort necessary on
the server end.

Because potential problem you mention looks to me about a different
design where the server talks "message" and client side applies
_($msg) to it, I suspect that you misread what I meant to say.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html