Re: [RFC] Secure central repositories by UNIX socket authentication

"Shawn O. Pearce" <spearce@xxxxxxxxxxx> · Sun, 27 Jan 2008 19:47:22 -0500

Junio C Hamano <gitster@xxxxxxxxx> wrote:
> "Shawn O. Pearce" <spearce@xxxxxxxxxxx> writes:
> 
> > This isn't anywhere near ready for application, but I'm floating
> > it out there to see what people think.  Its a cool new feature that
> > will certainly *not* be in 1.5.4.  :-)
> >
> > In a central repository configuration users may not have access
> > to write new objects into a Git repository, or to edit the refs,
> > especially if the repository is being protected by an update hook
> > (e.g. contrib/hooks/updated-paranoid).
> 
> Sorry, but I am puzzled about what this assumption is trying to
> achieve here.
> 
> If the configuration is based on central repository model,
> wouldn't the users who are participating in the project have
> write access to the repository by being in the project's group
> and the repository initialized with core.sharedrepository=true?

Hmm.  core.sharedrepository is sometimes a bad solution.

core.sharedrepository means I need to give write access to both the
refs database and the object database to all members of the project.
Some of whom may not be able to be trusted with tools like "rm",
but who need real shell access to that system anyway.  And sometimes
management won't allow users to have two accounts on the same system
(one that is fixed to git-shell, and one that has a real shell)
because the world would implode if a user was given two different
accounts for two different access purposes.  I have no idea why that
would happen, but someone paid 3x what I earn has figured that out.

Last I checked how UNIX filesystem access controls work, Git's
core.sharedrepository cannot possibly prevent a user from doing
something like this:

	cd $repo_path
	git log
	... go to lunch ...
	rm -rf *
	git log
	... bitch about how crappy git is ...

Now any "real" version control and SQL database system allows the
administrator to restrict access to the database to avoid such
mistakes.  CVS has pserver; SVN can use Apache HTTPd or its own
server; Perforce has its own server.  PostgreSQL, Oracle, Informix,
DB2, even MySQL don't allow users to directly read or modify the
database files but instead ask them to go through authenticated
socket based interfaces.

Under a DSCM one would say the central model is insane, and that
every user should have their own repository, with write access
limited to only that user.  Obviously this is the model that
kernel.org uses for kernel development, and that git itself uses
for git development.

The problem is the purely distributed model falls apart when you have
50 "Aunt Tillies" making changes to the same 30 files at around the
same time.  Its bad enough that they have to have a local clone and
push and fetch to share their changes.  Trying to explain that you
need to fetch+merge from Jane right now and Bob 3 minutes later,
then back to Jane to get the changes you are working on in parallel
is sheer chaos.  Eyes gloss over and management declares "Git is
crap; it cannot possibly be used in the enterprise".

How do you setup 50 URLs into all 50 user's .git/config?  When a new
user joins the project how do you get their URL into all existing
user's trees?  Its total chaos in the cube farm as they shout back
and forth "Did you get Bob's changes?  Jane's?  Oh, maybe you didn't
get Sally's too and that's why you aren't seeing X in there".

At day-job I manage two completely different workflows, but both
are based upon Git.

Real developers who hack out program source code use a model much
like kernel.org.  Code is developed on topic branches, code is
reviewed on topic branches at the individual change level, and code
is merged from a developer's topic branch by a maintainer into a
master branch.  For convience sake we store all topic branches in a
single central repository, so everyone just has to have the "origin"
URL in their local repositories.  We could deal with individual
developer repos like kernel.org does.  We choose not to simply
because we also have to handle the next case, and its easier to
not have individual developer repos.

Aunt Tillies (who far out number real developers) edit small text
files through a fancy GUI tool.  These folks don't really care
about versions, topic branches, and really don't want to know.
All they know is they have to edit "Foo.data_file" but to do so
they need the changes just made to "Bar.data_file" 5 minutes ago.
Usually they don't even know if Bob, Sally, Jane or Nick made that
change, they just know they need it.  And their collective changes
(from all 50 Aunt Tillies) all have to somehow wind up in the same
Git branch at the end of the day so a real developer/maintainer
can pull it into a product build.

I've lived through the daily fires of these workflows over the past
year and a half.  Its mostly settled out to something that works very
well for us, but its heavily based upon this concept of a central,
shared repository.  And to keep the auditors and management happy
I cannot allow "rm -rf *" to be executed by a user who happens to
have push access to that same repository.

We're not the only Git user that has a shared repository.  Doesn't
X.org use a shared repository model?  I'm guessing that since they
are an open source project they have less concerns about the "rm
-rf *" case.  Wasn't the receive-pack service added to git-daemon
to allow users to push into a repository, but not actually have
write access to its filesystem?  Obviously someone else other than
just me wants to safeguard the repository.

-- 
Shawn.
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html