Re: [RFC/WIP PATCH 11/11] Document protocol version 2

Stefan Beller <sbeller@xxxxxxxxxx> · Mon, 1 Jun 2015 16:14:15 -0700

On Fri, May 29, 2015 at 3:21 PM, Jeff King <peff@xxxxxxxx> wrote:
> On Fri, May 29, 2015 at 02:52:14PM -0700, Junio C Hamano wrote:
>
>> > Currently we can do a = as part of the line after the first ref, such as
>> >
>> >     symref=HEAD:refs/heads/master agent=git/2:2.4.0
>> >
>> > so I thought we want to keep this.
>>
>> I do not understand that statement.
>>
>> Capability exchange in v2 is one packet per cap, so the above
>> example would be expressed as:
>>
>>       symref=HEAD:refs/heads/master
>>         agent=git/2:2.4.0
>>
>> right?  Your "keyvaluepair" is limited to [a-z0-9-_=]*, and neither
>> of the above two can be expressed with that, which was why I said
>> you need two different set of characters before and after "=".  Left
>> hand side of "=" is tightly limited and that is OK.  Right hand side
>> may contain characters like ':', '.' and '/', so your alphabet need
>> to be more lenient, even in v1 (which I would imagine would be "any
>> octet other than SP, LF and NUL").

I think the recent issue with the push certificates shows that having arbitrary
data after the = is a bad idea. So we need to be very cautious when to allow
which data after the =.

I'll try split up the patch.

>
> Yes. See git_user_agent_sanitized(), for example, which allows basically
> any printable ASCII except for SP.
>
> I think the v2 capabilities do not even need to have that restriction.
> It can allow arbitrary binary data, because it has an 8bit-clean framing
> mechanism (pkt-lines). Of course, that means such capabilities cannot be
> represented in a v1 conversation (whose framing mechanism involves SP
> and NUL). But it's probably acceptable to introduce new capabilities
> which are only available in a v2 conversation. Old clients that do not
> understand v2 would not understand the capability either. It does
> require new clients implementing the capability to _also_ implement v2
> if they have not done so, but I do not mind pushing people in that
> direction.
>
> The initial v2 client implementation should probably do a few cautionary
> things, then:
>
>   1. Do _not_ fold the per-pkt capabilities into a v1 string; that loses
>      the robust framing. I suggested string_list earlier, but probably
>      we want a list of ptr/len pair, so that it can remain NUL-clean.
>
>   2. Avoid holding on to unknown packets longer than necessary. Some
>      capability pkt-lines may be arbitrarily large (up to 64K). If we do
>      not understand them during the v2 read of the capabilities, there
>      is no point hanging on to them. It's not _wrong_ to do so, but just
>      inefficient; if we know that clients will just throw away unknown
>      packets, then we can later introduce new packets with large data,
>      without worrying about wasting the client's resources.
>
>      I suspect it's not that big a deal either way, though. I have no
>      plans for sending a bunch of large packets, and anyway network
>      bandwidth is probably more precious than client memory.

That's very sensible thoughts after rereading this email. The version
I'll be sending out today will not follow those suggestions though. :(

>
> -Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html