Re: [PATCH 4/4] Doc: push with --base

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 11/2/2020 7:26 PM, Jonathan Tan wrote:
> Includes protocol documentation and a design document.

>  Servers that receive any such Extra Parameters MUST ignore all
> -unrecognized keys. Currently, the only Extra Parameter recognized is
> -"version" with a value of '1' or '2'.  See protocol-v2.txt for more
> -information on protocol version 2.
> +unrecognized keys. Currently, the only Extra Parameters recognized are
> +"version" with a value of '1' or '2' and, for push, "base" with an OID.  See
> +protocol-v2.txt for more information on protocol version 2.
>  
>  Git Transport
>  -------------
> @@ -506,6 +506,10 @@ real difference is that the capability listing is different - the only
>  possible values are 'report-status', 'report-status-v2', 'delete-refs',
>  'ofs-delta', 'atomic' and 'push-options'.
>  
> +If a "base=<oid>" Extra Parameter was sent by the client, and the
> +server recognizes that object, the server MAY send "<oid> .have" in
> +lieu of all the reference obj-ids and names.
> +

nit: no comma before "and the server recognizes" as these are both
conditions of the "if". My preference is also to include a "then"
before the result, especially with compound conditions like this.
Perhaps also make it more active voice while we are here?

+ If the client sent a "base=<oid>" Extra Parameter and the server
+ recognizes that object, then the server MAY send "<oid> .have" in
+ lieu of all the reference object ids and names.

> +Push with base design notes
> +===========================

Perhaps start with a brief problem statement?

+ If a client wishes to push data to a server without first running
+ `git fetch`, then they might not have local copies of the objects
+ at the server's ref tips. The client cannot compute an adequate
+ common base and will send more objects than necessary.

Then,

+ The "push with base" feature allows...

> +This feature allows clients, when pushing, to indicate that a
> +certain object is an ancestor of all pushed commits and that they
> +believe that the server knows of this object. This in turn allows
> +servers to send an abbreviated ref advertisement containing only that
> +object.

Why only one base? Could there be value in sending multiple possible
bases and the server can report the subset that they know about?

> +Besides bandwidth savings,

It seems you are underselling the bandwidth savings here, because
we save in both the ref advertisement and the push pack size. It
might be good to call out both metrics so we can see that the ref
improvement works even for cases where the client recently fetched
and can figure out the proper base on its own!

> this also ensures that the ref
> +advertisement contains information relevant to the client. For
> +example, at least one project (Gerrit [1]) have included workarounds
> +to send ancestors of refs that move often, even though the ref
> +advertisement is only meant to contain refs.
> +
> +[1] https://gerrit.googlesource.com/gerrit/+/refs/heads/master/java/com/google/gerrit/server/git/receive/HackPushNegotiateHook.java
> +
> +
> +Design overview
> +---------------
> +
> +The "base" being sent is sent as an Extra Parameter, supported in the
> +git://, ssh://, and http(s):// protocols. By sending it as an Extra
> +Parameter, the server is aware of this parameter before it generates
> +the ref advertisement, thus making it able to tailor the ref
> +advertisement accordingly. Sending it as an Extra Parameter also makes
> +this protocol backwards-compatible, as servers will ignore any Extra
> +Parameters they do not understand. (The push will then proceed as if
> +neither party had this feature.)

I'm not familiar enough with the extra parameter logic to know if we
allow multi-valued extra parameters. It would be good to relax the
"base" parameter to either allow multiple instances of the parameter
or to have something like a comma-separated list of OIDs be allowed.

This might be particularly important for multiple refs being pushed
at the same time.

> +The remote helper protocol has been extended to support the
> +"push-base" capability and an option of the same name. When a remote
> +helper advertises this capability, it thus indicates that it supports
> +this option. Git then will send "option push-base" if the user
> +specifies it when invoking "git push".
> +
> +The remote-curl remote helper bundled with Git has been updated to
> +support this capability and option.> +
> +
> +Future work
> +-----------
> +
> +In the future, we might want a way to automatically determine the base
> +instead of always having the user specify it. However, this does not
> +make obsolete any of the current work - once the base is automatically
> +determined, we still need this protocol to communicate it to the
> +server, and allowing the user to specify the base manually is still
> +useful.

It is appropriate to separate the protocol ability from the automatic
computation on the client. It would be nice to know how you expect that
to work.

For the single-ref push case, my default instinct would be to find a
merge-base between that ref and all of the remote refs matching the
remote we are pushing to. Bonus points if we find _all_ (maximal)
merge-bases!

The equivalent for the multi-ref push might be to find the boundary
commits from something like the following command:

	git rev-list --boundary \
		--not refs/remotes/origin/A \
		--not refs/remotes/origin/B \
		--not refs/remotes/origin/C \
		refs/heads/toPush/1 \
		refs/heads/toPush/2 \
		refs/heads/toPush/3

But you might have something better in mind.

> +
> +
> +Alternatives
> +------------

Odd to have a single bullet in a bulleted list.

> +- Making a more substantial protocol change like "fetch" protocol v2.

This isn't a full sentence.

Perhaps: "One considered approach was to introduce a multi-phase
negotiation step similar to how 'git fetch' operates."

> +  This would eliminate the need for some of the remote helper updates;
> +  as part of the protocol change, the protocol could be made to
> +  support "stateless-connect" and thus no remote helper updates (like
> +  "push-base") would be needed. For "fetch", the protocol change has
> +  enabled features like wanted-refs and packfile-uris, but I do not
> +  have any similar ideas in mind for "push".

I think that you should mention that a negotiation phase here would
use significantly more bandwidth to do the ref advertisements on both
ends with only a rare case that the negotiated bases are better than
the locally-computed bases. Everything is about tradeoffs here!

I'm generally excited about the opportunities here. I'd love to see
some measurements for reduced ref advertisements and reduced object
counts in the pushed pack.

Thanks,
-Stolee



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux