On Tue, Jun 5, 2012 at 12:28 PM, Stephan Peijnik <stephan@xxxxxxxxxx> wrote: > On 06/05/2012 08:54 PM, Shawn Pearce wrote: >> >> Its actually only one TCP connection... assuming the servers in >> between the client and the Git endpoint correctly support HTTP >> keep-alive semantics. > > With keep-alive that is true, but a quick check on the actual data exchange > tells me that multiple HTTP requests are still needed. But I guess the > overhead caused by a second HTTP requests can be ignored. There is extra overhead from the HTTP request headers, this is true. Fortunately its relatively small and bounded. There isn't that much additional latency in the smart HTTP protocol. Where the client is waiting on data from the server is where we end an HTTP request and start a new one. The mulit_ack capability on normal TCP or SSH connections does get to interleave a bit more in the native protocol to try and hide the RTT latency. I don't know that anyone has done extensive testing to determine how effective that is vs. the batch sizes we run in the HTTP POST format. With the key part being how quickly the overall negotiation exchange went for the end-user. Colby Ranger's recent contribution to contrib/persistent-https provides a local proxy server for Git over HTTP that tries to reuse HTTP connections across Git command invocations. This can go further than even git:// does with TCP connection reuse, cutting latency. Of course a user can do the same thing with their own local HTTP proxy, but persistent-https can be easier to install and configure. >> How does this fair going through crappy proxy servers that perform >> man-in-the-middle attacks on SSL connections? Just last week I was >> trying to help someone whose local proxy server was MITM the SSL >> session behind Git's back, and their IT department forgot to install >> the proxy server's certificate into the system certificate directory. >> They only installed it into the browser. That proxy also doesn't >> correctly grok HTTP 1.1 keep-alive with chunked transfer encodings. >> Let alone something as new as web sockets. > > Proxy servers could be an issue, yes. For proxy servers not acting as MITM > and which are supporting CONNECT this shouldn't be an issue though. I am still annoyed by the failure of "Expect: 100-continue". The original smart HTTP protocol used this during push to try and avoid sending a 100 MiB POST payload before finding out authentication is required and failing. It turns out far too many HTTP servers and proxy servers do not correctly implement 100-continue to rely on it in the protocol, so we had to back that out and use a special POST body with 4 bytes to "probe" the remote server before sending the full payload. 100-continue is in RFC 2616, dated June 1999. My calendar says June 2012. So 13 years later and we still cannot rely on 100-continue as specified by RFC 2616 working correctly on the public Internet. Chunked Transfer-Encoding is described in RFC 2038, dated Jan 1997 and is the RFC that RFC 2616 made obsolete. This is still not working reliable everywhere... 15 years after being specified proxy servers are still converting chunked transfer encoding to "Connection: close" and destroying any HTTP keep-alive that might have been possible. Basically I learned a lot in the past 2 years deploying Git on a rather broad scale with HTTP. The public Internet doesn't resemble the standards enough, and you really have to code to the lowest common denominator, because there is some user out there that matters to you who is stuck behind some HTTP proxy that implements the HTTP standard as it existed in 1995 and whose managers/suppliers refuse to bring forward to the current century. > Also, given the current HTML5 hype things should get better in the future, > but you are correct about potential current issues with the approach. WebSockets seems pretty full of fail to me. The protocol specification is really complex. Its reimplementing TCP on top of HTTP to work around an artificial browser imposed limitation on the number of suggested HTTP connections the browser opens to the server. Meanwhile SPDY goes the other direction and tries to support multiplexing a larger number of HTTP requests into a single TCP connection, while reusing a lot of header data across those requests. I have higher hopes for SPDY adoption than for WebSockets. SPDY solves a lot of common problems on the Internet that social networking sites care about, like time to load assets for a game, or that publishers care about, like time to load all assets for a site on initial visit, increasing the chances the user doesn't immediately jump away due to perceived high loading time. WebSockets is a large amount of wanking to make playing a game written in JavaScript easier, using a very ugly protocol, and a much more complex software stack. I think the WebSockets authors saw the problem of HTTP connections in a browser and solved it the wrong way. They saw the bidirectional stream problem... and solved it for a very narrow use case. SPDY also relies on a bidirectional stream, but it lets the server do more, like suggest pushing assets down ahead of the browser realizing it needs them. >>> So in comparison there is possibly a lot less overhead and, in theory, >>> the >>> performance should be comparable to running the smart protocol over ssh. >>> Personally I'd say the WebSocket approach is cleaner than the HTTP-POST >>> approach. >> >> This may be true. But its also a lot more complex to implement. I >> noticed you reused Python code to help make this work. > > The only reason I used Python is that I wanted to quickly come up with a > prototype. I am also aware of the fact that a proper implementation should > possibly be done in C. Any implementation of a new embedding of the Git protocol into e.g. WebSockets also requires implementing it in Java for JGit, both client and server, and probably also in Python for Dulwich, again client and server. Given that WebSockets is all about cramming TCP into HTTP in SSL in TCP, and doesn't always work on the public Internet given the current state of proxy servers, I just don't see the value in this. We have to provide complete support across the major Git implementations, otherwise users will come to this mailing list and complain about how implementation $X cannot talk to server running $Y because server owner $Z only configured the new fangled WebSockets protocol. And I am simply too lazy to write a procmail script to direct all such inquiries to your address. >> Let me know when there is a GPLv2 client library that implements >> sufficient >> semantics for WebSockets that Git can bundle it out of the box. > > As for the WebSocket client library that is GPLv2 compatible: there is at > least libwebsockets [0], which is licensed under the terms of the LGPL v2.1, > and as such GPLv2 only compatible. OK, yay. Someone actually bothered to implement this? >> And let me know when most corporate IT proxy servers correctly grok >> WebSockets. I suspect it will be many more years given that they still >> can't even grok chunked transfer encoding. > > As stated above, this could be a problem, yes. > The question is whether one only wants to provide an alternative approach > when it is usable for everyone. I predict WebSockets will be usable by everyone about... never. Its too complex of a standard, and too narrow of a corner case. We are talking about proxy servers that can't do 100-continue correctly, because async network IO was too hard for them to code. Those authors and their software will never support WebSockets' bidirectional requirement. WebSockets isn't critical to browse the web. It never will be. 100-continue might be useful with form based file uploads, which are at least 100x more common on the web than a WebSocket powered thing. Seriously. Call me when WebSockets is actually working. I'd like to come back and see what the world looks like in 2152. > My intention never was to have the current http implementation, be it the > dumb or http-backend one, replaced. The idea here was to provide an > additional option that makes use of a fairly new technology, with all > benefits and drawbacks of using something new. You are free to develop your own remote helper that does this. But I don't expect any Git distribution or implementation to be supporting it. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html