Re: [PATCH 0/3] protocol v2 and hidden refs

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> · Sun, 16 Dec 2018 12:47:59 +0100

On Sun, Dec 16 2018, Jeff King wrote:

> On Sat, Dec 15, 2018 at 08:53:35PM +0100, Ævar Arnfjörð Bjarmason wrote:
>
>> > So I'm a bit worried that the unified endpoint model is going to be a
>> > dead end, at which point carrying around git-serve just makes things
>> > more complicated.
>>
>> This is from wetware memory of something discussed at a previous Git
>> Merge, so I may have (inadvertently) made it up, but wasn't part of the
>> idea of "git-serve" to have an extensible next-gen protocol where we
>> could add new actions & verbs unrelated to sending or receiving packs?
>
> Yes, I think that's a goal (and we already have upload-archive, which is
> a similar thing).
>
>> Of course that's not in itself an argument for having a general "serve"
>> command, actually the opposite for the reasons you mention with locking
>> down things. E.g. maybe I want to support server-side git-grep on my
>> server, but not git-log, and if it's one command it becomes a hassle to
>> do that via SSH config or HTTPD config for the reasons you mention.
>
> Right, exactly. It pushes more of the information down into Git's own
> protocol. Of course we _can_ build mechanisms at that level for
> configuring which verbs are allowed. But if some context is available at
> the higher protocol level, then we can use the mechanisms at that higher
> level.
>
> I think of it as a tradeoff. By including the endpoint in the transport
> protocol (e.g., in ssh the command name, in HTTP the URL), we get to use
> the mechanisms in those transports to make policy decisions on the
> server. But it also means we _have_ to implement those policies twice,
> once per transport.
>
> IMHO having to deal with both transports is not that big a loss,
> considering that there are only two, and really not likely to be more.
> git:// is already unauthenticated, and IMHO is mostly a dead-end for
> future protocol work since it provides no advantage over HTTP, and the
> future is mostly HTTP, with ssh for people who really prefer its
> authentication mode.
>
>> The upside would be that once a host understands "git serve" I'm more
>> likely to be able to get past whatever middle layer there is between my
>> client and the "git" binary on the other side. E.g. if I have a newly
>> compiled "git" client/server binary, but something like GitLab's
>> "gitaly" sitting between the two of us.
>
> But I think that's what makes it dangerous, too. :)
>
> Gitaly (and we have our own equivalent at GitHub) is responsible for
> making those policy decisions about who can run what. Opening a pipe
> between the client and the backend that can issue arbitrary verbs is
> exactly what they _don't_ want to do.
>
> So they have to intercept the conversation at least at the verb level.
> It _is_ nice if conversation for each verb is standardized (so once a
> verb is issued, they can just step out of the way and proxy bytes[1]),
> and v2 helps with that.
>
> That's not too hard for a Git-aware endpoint to implement.  But when
> that verb interception can be done at the HTTP/ssh level, then it's easy
> for tools that _aren't_ Git-aware to do handle it (again, like the
> Apache config we recommend in git-http-backend(1)).
>
> -Peff
>
> [1] Actually, we do much more intimate interception than that at GitHub
>     already. The upload-pack conversation is mostly vanilla, but for
>     receive-pack we handle replication at that layer. So your pack is
>     streamed to 3-6 backend receive-packs simultaneously, and that
>     endpoint layer handles quorum for updating refs, etc.

Yeah I think overall this makes sense. I was just thinking we'd have
stuff like this needing to be maintained in all middleware:
https://gitlab.com/gitlab-org/gitlab-shell/blob/v8.4.4/lib/gitlab_shell.rb#L15-28

Which, if and when we have a lot of verbs would be a pain, but of course
server operators might want to explicitly whitelist them.

Also for things like "git-grep" I can see e.g. "no POSIX regex" (due to
well known DoS issues0 being a configuration we ourselves would want to
carry, at that point server operators would need to maintain two
whitelists anyway, one in their custom code & another in /etc/gitconfig.

But I think that trade-off is worth it as you note because when you want
to filter these it's handy to be able to do it in a dumb ssh/web server.

Another thing to consider is not having a proliferation of things in
git-<TAB> completion again. AFAIK these things can't have spaces in them
for /etc/passwd & inetd tab completion. So perhaps call them all
git-serve-*, or not put them in our bin/ install path as a special case?