Re: what are the chances of a 'pre-upload' hook?

Jeff King <peff@xxxxxxxx> · Mon, 28 Nov 2011 02:51:07 -0500

On Sat, Nov 26, 2011 at 11:51:29PM -0800, Junio C Hamano wrote:

> I personally have a huge problem with shipping an inherently unsafe
> mechanism that is disabled by default even if it is marked with big red
> letters saying that it is unsafe and should not be enabled casually. It
> makes it up to the system administrator to decide what is casual and what
> is not, but when end users are get harmed by a clueless administrator's
> decision, the repercussion will come back to the Git software, not to the
> clueless administrator who enabled an unsafe mechanism in an environment
> where it was not designed to support.

I agree. If we can avoid adding any feature which is even potentially
dangerous, it's good to explore alternatives. At the same time, the
subject of remote fetch hooks has come up before, and I don't think it's
going away. It would be nice to reach more resolution this time.

I also have mild concerns about the current receive-pack hooks. The
thought so far has been "well, you're less likely push into a bad
person's repository". Which is probably true. But it seems like a flaky
thing to base security on, especially in multi-user environments with
shared repositories.

> If the mechanism to notify the external machinery (i.e. counting accesses,
> learning the true destination of a new fetch request and have the fetcher
> wait while the real fetch goes to the origin site) were not via a hook
> that runs as the fetcher but were via something else that runs as the
> owner of that external service (i.e. counting accesses, maintaining the
> proxy object pool), I wouldn't have trouble with the proposal.
> 
> For example, upload-pack could write the details of the original request
> to a named pipe to ask the "service" process listening on the other end,
> and wait for its response.

Right, that was the "socket" solution I mentioned earlier. The problem
is that it's a pain to set up. If you really want to have a daemon
running and handling git tasks, then you might as well just have the
whole upload-pack/receive-pack side accessible over a socket (e.g., just
run git-daemon if you don't care about authentication, and
git-over-https if you do).

If you really care about ssh authentication and keeping track of
individual unix users, then one possibility is to run git-daemon
listening on a unix socket, use SO_PEERCRED to get the user on the other
end, and then process everything as a single git user.

For example, imagine Bob and Alice are in the "research" group, and use
a shared central repo. Bob wants to push:

  1. Bob runs "git push" on his workstation. Git on the workstation
     connects to "ssh bob@server git-receive-pack $repo".

  2. git-receive-pack on the server runs as Bob. It notices that
     $repo/daemon-socket exists, and runs in "shim" mode, connecting to
     the socket and proxying data over the socket.

  3. git-daemon, running as the "research" user, listens on the other
     end of the socket (and potentially sockets for other repos). It
     sees the connection and serves it. The daemon already knows which
     repo to serve because it knows which socket Bob connected to. And
     it knows it's OK for Bob to access it, because he was able to get
     to the socket via filesystem permissions.

     The daemon can use getsockopt(SO_PEERCRED) to find out that Bob is
     the user, and probably set an environment variable. It can freely
     run any hooks, because the hooks are owned by the "research" user,
     and that's the user the daemon runs as. Hooks can trust the
     environment from the daemon, and can use Bob's identity for further
     authorization checks if they want to. IOW, we are roughly at the
     same place you would get by connecting over git-over-https.

This naturally extends to "Bob fetches" or "Bob runs his push from the
server".  As an added bonus over the usual shared-central-repo model,
malicious users can't just ignore hooks (in the usual shared-repo model,
everybody can write directly to the object db and refs, so hooks are
somewhat advisory).

I suspect people who really care about this use case are already running
git-over-https, possibly with something like gerrit doing some policy
enforcement. I also suspect there are a lot of corporate environments
where they just have a big shared repo, set core.sharedRepository, and
don't worry too much about users hurting each other (in most dev
environments I've seen, people are perfectly willing to fetch and run
make without looking at the diff, so it's sort of a moot point).

People in the latter half would be one of the potential audiences for
the "screw it, just run hooks from the remote repo" flag.

> I actually like the idea of allowing pre-upload-pack hook on git:// and
> possibly http:// only. git-daemon can tell the upload-pack that it is OK
> to run the hook, and the hook can do the things that only the daemon can
> do, never touching what the original requestor would but the repository
> owner would not have an access to.

That's not necessarily safe. Think about a site like kernel.org. The
administrator is the one running the daemon, on behalf of all of the
users. But my impression is that pre-August, developers had shell access
to their own repos and could write their own hook files. So if
git-daemon runs hooks, then any repo owner could run arbitrary code as
the git-daemon, including killing the running daemon and running their
own trojan.

So you really need some way for the admin to say "it's OK, hooks are
locked down and it's safe for them to run". Which again, I was thinking
of as an environment variable.

However, I think for the daemon case, the other alternative I mentioned
earlier is a bit more elegant: an environment variable which doesn't say
"respect the hooks in the repo", but rather "run this _particular_
program as the pre-upload-pack hook".  And then the admin doesn't have
to care what's in the repos. The daemon runs some particular program
every time a fetch happens.

It's tempting to say "can't they just replace git-upload-pack with a
shim that does what they want and then runs upload-pack?"[1]. But I
think the answer is no, for the same reason we have hooks at all: they
might want to operate under a lock, or use some data produced by
upload-pack, or make a decision that blocks the fetch.

[1] Actually, I think Sitaram's case may not be a pre-upload-pack hook,
    but rather the simple "run before upload-pack" case. Because a
    pre-upload-pack would probably feed the set of requested objects to
    the hook. But he will want to fetch from the upstream before even
    advertising any refs to the client.

> One scenario I do not want to see is this. Suppose there is a company that
> runs more than one project, and one of the projects is so secret that only
> engineers working on the project have access to its repository, while all
> people, including the project-A engineers, in the company have access to
> other repositories. Further suppose that people involved in some or many
> of the general repositories want to do something before a fetch from them
> happens. They will use the pre-upload-hook to implement it if we make it
> available, and their new-hire training course will tell their engineers to
> set the GIT_ALLOW_UNTRUSTED_HOOKS. Perhaps the /etc/skel/profile the
> company gives the new-hires already defines it.

Right. This is a failure of the training course and/or the admin who
wrote the default profile, and is exactly the thing that would be
mentioned in the red flags in the docs. I agree there is a risk of
bone-headed admins, though.

This, btw, is more or less the case I am concerned about _now_ with
receive-pack. With current git, if any of those general repositories is
a shared central repo, then project A is similarly vulnerable to anyone
who can write the hook.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html