On Sun, Dec 16 2018, Jeff King wrote: > On Sat, Dec 15, 2018 at 08:53:35PM +0100, Ævar Arnfjörð Bjarmason wrote: > >> > So I'm a bit worried that the unified endpoint model is going to be a >> > dead end, at which point carrying around git-serve just makes things >> > more complicated. >> >> This is from wetware memory of something discussed at a previous Git >> Merge, so I may have (inadvertently) made it up, but wasn't part of the >> idea of "git-serve" to have an extensible next-gen protocol where we >> could add new actions & verbs unrelated to sending or receiving packs? > > Yes, I think that's a goal (and we already have upload-archive, which is > a similar thing). > >> Of course that's not in itself an argument for having a general "serve" >> command, actually the opposite for the reasons you mention with locking >> down things. E.g. maybe I want to support server-side git-grep on my >> server, but not git-log, and if it's one command it becomes a hassle to >> do that via SSH config or HTTPD config for the reasons you mention. > > Right, exactly. It pushes more of the information down into Git's own > protocol. Of course we _can_ build mechanisms at that level for > configuring which verbs are allowed. But if some context is available at > the higher protocol level, then we can use the mechanisms at that higher > level. > > I think of it as a tradeoff. By including the endpoint in the transport > protocol (e.g., in ssh the command name, in HTTP the URL), we get to use > the mechanisms in those transports to make policy decisions on the > server. But it also means we _have_ to implement those policies twice, > once per transport. > > IMHO having to deal with both transports is not that big a loss, > considering that there are only two, and really not likely to be more. > git:// is already unauthenticated, and IMHO is mostly a dead-end for > future protocol work since it provides no advantage over HTTP, and the > future is mostly HTTP, with ssh for people who really prefer its > authentication mode. > >> The upside would be that once a host understands "git serve" I'm more >> likely to be able to get past whatever middle layer there is between my >> client and the "git" binary on the other side. E.g. if I have a newly >> compiled "git" client/server binary, but something like GitLab's >> "gitaly" sitting between the two of us. > > But I think that's what makes it dangerous, too. :) > > Gitaly (and we have our own equivalent at GitHub) is responsible for > making those policy decisions about who can run what. Opening a pipe > between the client and the backend that can issue arbitrary verbs is > exactly what they _don't_ want to do. > > So they have to intercept the conversation at least at the verb level. > It _is_ nice if conversation for each verb is standardized (so once a > verb is issued, they can just step out of the way and proxy bytes[1]), > and v2 helps with that. > > That's not too hard for a Git-aware endpoint to implement. But when > that verb interception can be done at the HTTP/ssh level, then it's easy > for tools that _aren't_ Git-aware to do handle it (again, like the > Apache config we recommend in git-http-backend(1)). > > -Peff > > [1] Actually, we do much more intimate interception than that at GitHub > already. The upload-pack conversation is mostly vanilla, but for > receive-pack we handle replication at that layer. So your pack is > streamed to 3-6 backend receive-packs simultaneously, and that > endpoint layer handles quorum for updating refs, etc. Yeah I think overall this makes sense. I was just thinking we'd have stuff like this needing to be maintained in all middleware: https://gitlab.com/gitlab-org/gitlab-shell/blob/v8.4.4/lib/gitlab_shell.rb#L15-28 Which, if and when we have a lot of verbs would be a pain, but of course server operators might want to explicitly whitelist them. Also for things like "git-grep" I can see e.g. "no POSIX regex" (due to well known DoS issues0 being a configuration we ourselves would want to carry, at that point server operators would need to maintain two whitelists anyway, one in their custom code & another in /etc/gitconfig. But I think that trade-off is worth it as you note because when you want to filter these it's handy to be able to do it in a dumb ssh/web server. Another thing to consider is not having a proliferation of things in git-<TAB> completion again. AFAIK these things can't have spaces in them for /etc/passwd & inetd tab completion. So perhaps call them all git-serve-*, or not put them in our bin/ install path as a special case?