On Wed, Jun 21, 2017 at 05:04:12PM +0200, Ævar Arnfjörð Bjarmason wrote: > > In terms of implementation, the HTTP transport could use Server-Sent > > Events, and the SSH transport can pretty much do whatever so that > > should be easy. > > In case you didn't know, any of the non-trivially sized git hosting > providers (e.g. github, gitlab) provide you access over ssh, but you > can't just run any arbitrary command, it's a tiny set of whitelisted > commands. See the "git-shell" manual page (github doesn't use that exact > software, but something similar). These days you don't even hit the actual fileservers with ssh at all. We terminate all of the protocols (http, git://, and ssh) at a proxy layer that kicks off git commands in the actual repositories using a separate protocol. The ssh handshakes were a huge performance bottleneck, so by doing it that way we can scale out the front-end tier independently of the repository storage (and of course it also provides a convenient layer for mapping user visible repository names into sharded paths). Not to take away from your point. Just a little bit of trivia. > But overall, it would be nice to have some rationale for this approach > other than that you think polling is ugly. There's a lot of advantages > to polling for something you don't need near-instantly, e.g. imagine how > many active connections a site like GitHub would need to handle if > something like this became widely used, that's in a lot of ways harder > to scale and load balance than just having clients that poll something > that's trivially cached as static content. Yeah. The naive way to implement this would be to have the client connect and receive the ref advertisement. And then when it's a noop (nothing to fetch), instead of saying "I want these objects", say "Please pause until one or more refs change". But I don't think we'd want to leave actual upload-pack processes sitting paused on the server. Their memory usage is too high. For this kind of "long polling" we have a separate front-end tier with a daemon that keeps the per-client cost very low. We could possibly wedge that into our proxy layer, but the system would be a lot simpler and more flexible if this were done separately from the actual git protocol. E.g., if an HTTP endpoint were defined that paused and returned data only when a particular repository's refs were updated. Another option is to keep polling, but just make noop fetches a lot cheaper. The ref advertisement on some repositories can get into the megabytes. I'd love to see protocol extensions for: 1. The client asking only for bits of the ref namespace they care about. I have some preliminary patches for this, but I really need to polish them. 2. Something ETag-ish where the client can say "I already saw state X, do you have updates?" Even just handling "no, no updates" (like an ETag) would be a big benefit. Bonus points if it can say "since state X, these are the changes; you are now at state Y". The sticking point on both is that the client needs to speak before the ref advertisement begins, which is why we have to deal with the protocol v2 headache. -Peff