On Mon, Jun 1, 2015 at 4:14 PM, Stefan Beller <sbeller@xxxxxxxxxx> wrote: > On Fri, May 29, 2015 at 3:21 PM, Jeff King <peff@xxxxxxxx> wrote: >> On Fri, May 29, 2015 at 02:52:14PM -0700, Junio C Hamano wrote: >> >>> > Currently we can do a = as part of the line after the first ref, such as >>> > >>> > symref=HEAD:refs/heads/master agent=git/2:2.4.0 >>> > >>> > so I thought we want to keep this. >>> >>> I do not understand that statement. >>> >>> Capability exchange in v2 is one packet per cap, so the above >>> example would be expressed as: >>> >>> symref=HEAD:refs/heads/master >>> agent=git/2:2.4.0 >>> >>> right? Your "keyvaluepair" is limited to [a-z0-9-_=]*, and neither >>> of the above two can be expressed with that, which was why I said >>> you need two different set of characters before and after "=". Left >>> hand side of "=" is tightly limited and that is OK. Right hand side >>> may contain characters like ':', '.' and '/', so your alphabet need >>> to be more lenient, even in v1 (which I would imagine would be "any >>> octet other than SP, LF and NUL"). > > I think the recent issue with the push certificates shows that having arbitrary > data after the = is a bad idea. So we need to be very cautious when to allow > which data after the =. > > I'll try split up the patch. > >> >> Yes. See git_user_agent_sanitized(), for example, which allows basically >> any printable ASCII except for SP. >> >> I think the v2 capabilities do not even need to have that restriction. >> It can allow arbitrary binary data, because it has an 8bit-clean framing >> mechanism (pkt-lines). Of course, that means such capabilities cannot be >> represented in a v1 conversation (whose framing mechanism involves SP >> and NUL). But it's probably acceptable to introduce new capabilities >> which are only available in a v2 conversation. Old clients that do not >> understand v2 would not understand the capability either. It does >> require new clients implementing the capability to _also_ implement v2 >> if they have not done so, but I do not mind pushing people in that >> direction. >> >> The initial v2 client implementation should probably do a few cautionary >> things, then: >> >> 1. Do _not_ fold the per-pkt capabilities into a v1 string; that loses >> the robust framing. I suggested string_list earlier, but probably >> we want a list of ptr/len pair, so that it can remain NUL-clean. >> >> 2. Avoid holding on to unknown packets longer than necessary. Some >> capability pkt-lines may be arbitrarily large (up to 64K). If we do >> not understand them during the v2 read of the capabilities, there >> is no point hanging on to them. It's not _wrong_ to do so, but just >> inefficient; if we know that clients will just throw away unknown >> packets, then we can later introduce new packets with large data, >> without worrying about wasting the client's resources. >> >> I suspect it's not that big a deal either way, though. I have no >> plans for sending a bunch of large packets, and anyway network >> bandwidth is probably more precious than client memory. > > That's very sensible thoughts after rereading this email. The version > I'll be sending out today will not follow those suggestions though. :( Thinking about this further, maybe it is a good idea to restrict the capabilities advertising to alphabetical order? The exchange would look like this: server: for capability in list: pkt_write(capability) pkt_flush client: do line = recv_pkt() parse_capability(line) while line != flush with parse_capability checking if we know the capability and maybe setting some internal field if we know this capability. Now if we assume the number of capabilities grows over time a lot (someone may "abuse" it for a cool feature, similar to the refs currently. Nobody thought about having so many refs in advance) So how does parse_capability scale w.r.t the number of capabilities? If parse_capability is just a linear search then it is O(n) and with n capabilities the client faces an O(n^2) computation which is bad. So if we were to require alphabetic capabilities, you could internally keep track and the whole operation is O(n). I just wonder if this is premature optimization or some thought we need to think of. To prevent this problem from popping up, it must be easier to introduce a new phase after the capabilities exchange than to just abuse the capabilities phase for whatever you plan on doing. Thanks, Stefan > >> >> -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html