Re: [RFC/WIP PATCH 00/11] Protocol version 2, again!

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jun 4, 2015 at 6:09 AM, Jeff King <peff@xxxxxxxx> wrote:
> On Mon, Jun 01, 2015 at 10:49:45AM -0700, Stefan Beller wrote:
>
>> However the client side with builtin/fetch, builtin/fetch-pack, fetch-pack
>> is a bit of a mystery to me, as I cannot fully grasp the difference between
>>  * connect.{h,c}
>>  * remote.{h.c}
>>  * transport.{h.c}
>> there. All of it seems to be doing network related stuff, but I have trouble
>> getting the big picture. I am assuming all of these 3 are rather a low level,
>> used like a library, though there must be even more hierarchy in there,
>> connect is most low level judging from the header file and used by
>> the other two.
>> transport.h seems to provide the most toplevel library stuff as it includes
>> remote.h in its header?
>
> connect.c was originally "the git protocol", and was used by send-pack
> and fetch-pack. Other individual programs implemented other transports.
> Later, as the interface moved towards everybody running "fetch" and
> "push", and those delegating work to the individual transports, we got
> transport.c, which is an abstract interface for all transports. It
> delegates actual git-protocol work to the functions in connect.c (or
> bundle work elsewhere, or handles remote-helpers itself).
>
> And then remote.c contains routines for dealing with the remotes at a
> logical level. E.g., which refs to fetch or push, etc.
>
> So in theory, the flow is something like:
>
>   - fetch.c knows "the user wants to fetch from 'foo'"
>
>   - fetch asks remote.c: "who is remote 'foo'"; we get back a URL
>
>   - fetch asks transport.c: "what are the refs for $URL"
>
>   - it turns out to be a git URL. transport.c calls into connect.c to
>     implement get_refs_via_connect.

Currently the distinction which protocol to speak is made
here (in get_refs_via_connect), which may be a bit late. Though I updating
the git protocol only first would also be feasible.

So for the next git protocol

 - get_refs_via_connect first asks for the capabilities and gets an answer from
   upload-pack-2. Now what?

 - we could have a callback in struct transport, which must be set
accordingly by
   fetch in step 4 (it turns out to be a git URL. transport.c ...)
   This callback is called with each pkt-line such as

        void parse-capability(char *line, struct
*transport_capabilities, void *cdata);

The line would contain the pkt-line, while the transport_capabilities
would be a struct
similar as in "[RFCv2 06/16] remote.h: add new struct for options",
where the fetch
implementation must select the right bits. Looking at fetch-pack.{c,h}
we only expose
one do-it-all method there, so we currently don't have file wide
easily accessible variables,
but rather all in a struct fetch_pack_args, which carries important
information for the
selection process such as verbosity or desired options. This is why a
void* comes in
handy as well. (It will be easy later to adapt that to the sending
side as well).

Instead of a full grown line by line callback we could also just
collect all the capabilities
first in a string list and then only call back once into a

    void select_capabilities(struct string_list *available, struct
string_list *selected);

I think I'd find this second approach more handy as there are subtle
details you'd miss in
the first approach. Looking at fetch-pack.c, do_fetch_pack (line 790),
we have one
selection (no_done) conditioned on another (multi_ack_detailed), so
having the full list
there makes the code easier.

This second approach however might not be as future proof as the
first, because we store
all received capabilities (which may grow large in the future) and not
throw unknowns away
immediately.

I tend to rather implement the second one (easier to read/maintain trumps a
maybe-performance-problem-in-the-future).

This performance-problem-in-the-future could be mitigated easily by having a
preselection in transport.c get_capabilities, which ignores any capabilities not
white listed there (harder to maintain though, as we have a more than one spot
where to put a list)

By writing this mail I realized another thing. I have had the patch
    "[RFCv2 09/16] remote.h: add get_remote_capabilities, request_capabilities"
which has request_capabilities just translating from a struct
containing some bits
into a sequence of pkt-lines containing the actual protocol
capabilities. Maybe we
should not have that in the connect file, but rather as proposed in
this email, the
high level command directly selects the strings to put back on the
wire. (By having
"struct string_list *selected" as part of the select_capabilities arguments.)
then the request_capabilities in connect.c would be dumbed down to just:

    void request_capabilities(int out, struct string_list *list)
    {
        struct string_list_item *item;
        for_each_string_list_item(item, list) {
             packet_write(out, item->string);
        }
        packet_flush(out);
    }

I think that would be reasonable?

>
>   - after fetch gets back the list of refs, it uses routines in remote.c
>     to figure out which refs we are interested in, handle refspecs, etc
>
>   - now fetch asks transport.c: "OK, fetch just these refs"
>
>   - transport.c again calls into connect.c to handle the actual fetch
>
> Of course over the years a lot of cruft has grown around all of them. I
> wouldn't be surprised if there are functions which cross these
> abstractions, or other random functions inside each file that do not
> belong.
>
>> and the issue I am concerned about is the select_capabilities as well as
>> the request_capabilities function here. The select_capabilities functionality
>> is currently residing in the high level parts of the code as it both depends on
>> the advertised server capabilities and on the user input (--use-frotz or config
>> options), so the capability selection is done in fetchpack.c
>>
>> So there are 2 routes to go: Either we leave the select_capabilities in the
>> upper layers (near the actual high level command, fetch, fetchpack) or we put
>> it into the transport layer and just passing in a struct what the user desires.
>> And when the users desire doesn't meet the server capabilities we die deep down
>> in the transport layer.
>
> I think you have to leave it in the fetch-pack code. As you note, it's
> the place where we know about what the user is asking for and can
> manipulate the list. And not all transports even support capabilities
> like this.
>
> -Peff

Okay
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]