Re: Dropping gitolite and breaking stg

Pierre-Yves Chibon <pingou@xxxxxxxxxxxx> · Fri, 19 Feb 2016 12:14:30 +0100

On Wed, Feb 17, 2016 at 11:51:51AM -0500, Ralph Bean wrote:
> On Wed, Feb 17, 2016 at 12:11:09PM +0100, Pierre-Yves Chibon wrote:
> Just as a point of clarification -- there are two systems in play for
> pkgs.fp.o currently:
> 
> - There is 'cgit' which provides the web-based view of the repos.
> - There is 'gitolite' which provides the backend ACL controls over who
>   is allowed to push to what.

And pkgdb that has the info about these ACLs, ACLs that are then propagated to
gitolite who applies them.

> Using pagure as the read-only view of the repos is definitely a good idea.
> What's being discussed here is the "if we should" and "how we should"
> of replacing the backend acl controls with pagure.  Generally, I think
> it's a good direction to move in.  Maybe we're already on the same
> page about that.

I have two comments about this paragraph:
* Pagure isn't read-only, it's read and write. We need to write for
  online-editing as well as to allow the pull-request mechanism.
  So pagure isn't just a replacement for cgit, it is a little more than this :)
* Then there is the ACLs question. I don't think we want to drop pkgdb, and
  pkgdb is where ACLs are stored.
  So the current workflow looks like:
     pkgdb -> script -> gitolite
  With pagure (imho) it will look like:
     pkgdb -> script -> gitolite
                     \_ pagure
  or eventually:
     pkgdb -> script  -> gitolite
           \_ script2 -> pagure
  or with the proposal made here:
     pkgdb -> service
           \_ script -> pagure
  What I was proposing here is that we drop the script and gitolite in favor of
  our own service/REST server that would grant/deny access based on the info in
  pkgdb (very like what gitolite does atm).

> > We could have the async service directly linked to pkgdb's DB. This means:
> > - Changes to pkgdb are directly propagated to pkgs.fp.o
> > - We can rely on the collections information directly retrieved from the DB
> > - We can use our current set-up (one shell account / packager) and not tweak
> >   gitolite until it behaves as we want (which is only supported by gitolite to
> >   please us).
> > - No need for the alias warning for namespacing, we can check if a namespace was
> >   specified and use ``rpms`` if not
> > - May be easier to hack/maintain in the long term (may be not, hard to say in a
> >   way :))
> 
> I'm not sure about querying pkgdb DB directly from the git hook -- or
> even querying pkgdb's JSON API directly.
> 
> - If we connect directly to the DB, it exposes the internals of pkgdb
>   in a way that could make it much harder to upgrade that schema in
>   the future.  Better to connect over the REST API.

+1 there

> - If we connect over the REST API, we could run into system
>   interdependence problems in the future.  If pkgdb goes down
>   accidentally, or if it needs to go down for maintenance, then the
>   dist-git repos will be dead in the water.  No one will be able to
>   push anything.

So my idea here was to not rely on pkgdb itself (also because of the number of
requests we may have to deal with), but rather to a small service (much like
mdapi) that would connect directly to pkgdb's DB.
The service could also run on pkgs01 directly (but would likely be py3, meaning
some porting work required on pkgdb, which is a good thing anyway).

This would mean:
- git remains independent from pkgdb (good)
- git is blocked if the DB server goes down (bad), but if our DB server goes
  down, most of our infra will have troubles anyway.

> - Right now, we basically cache *all* of the pkgdb acls on disk as
>   gitolite perms.  This has the advantage of decoupling the systems at
>   request time.  It has the disadvantage of synchronization lag.  When
>   ACLs get updated in pkgdb, we have to wait for those to sync to
>   gitolite to be meaningful in practice.  We used to have a cronjob on
>   which we waited forever.. we now have that fedmsg-genacls updater
>   that makes it much quicker, but not instant.  Can we keep this
>   same arrangement for a pagure replacement of gitolite?

We could make the service rely on a small local version of pkgdb's DB, but it
would be a little more work, would make the process a little more fragile (cf
mdapi's error when the sqlite DB is corrupted) but would bring the advantage of
still allowing to commit/build packages when we reboot our server (iirc, koji
doesn't need FAS, does it?).
There is pros and cons to this approach. The fact that so much of our apps would
be impacted anyway when the DB server goes down (including pagure itself for
login, but not for users logged in) weights-in a little more for no on-disk
caching for me, but I can be convinced otherwise :)

> > We would still need to have a service to create the git repo and eventually the
> > branches.
> > And for pkgs.fp.o we will definitely need a git hook (but there is already one)
> > to prevent branch from being deleted and do branch-based ACL control.
> 
> Yeah, since we still need a service to create the git repo and the
> branches, we might as well sync pagure ACLs at the same time, no?

We will need to sync to pagure, not from pagure I think, but yes, we'll need a
sync script anyway.

> > Note that I might still pursue this for pagure, not entirely sure yet though.
> > Just a thought while writing this, using this approach might actually make it
> > easier to deploy pagure for pkgs.fp.o since then we could indeed just use pkgdb
> > as data source and have a fedmsg-based updater to sync ACLs from pkgdb to
> > pagure.
> 
> I guess I don't understand what's easier about using pkgdb as a data
> source for pagure.  It seems harder to me (both to write up front as
> well as to maintain long term).

Well, imho, pkgdb should remain the canonical place to store and manage ACLs for
packages, otherwise we'll duplicate things between managing branch in one side,
ACLs in another side, we'd need to update the script syncing info to bugzilla as
well. I prefer to adjust pagure to get its info from pkgdb or we decide to drop
pkgdb entirely but managing the ACLs in pagure itself and the rest in pkgdb
doesn't quite appeal to me.

Note: In a way dropping pkgdb is tempting but it brings a number of new
questions:
  - What do we do about the watch* ACLs?
  - How do we determine the PoC? How do we change it? (Required for bugzilla)
    -> Or we drop bugzilla as well, but then the ticketing system of pagure will
    need much much more work (including a good search feature).
  - How are branches managed? (Request, creation, deletion)
  - How do we manage status (Maintain/Orphan/Retire)?
Of course we could build this in pagure as well, but I am afraid this would make
it too Fedora-specific and much less a self-hostable forge.
Anyway, food for thought, maybe we could find a way, micro-services?

I have written quickly the service for pagure itself, it lead to me to make
pagure work with py3. I'll need to test it some more "in condition" but it's
promising.

Pierre
Attachment:
signature.asc

Description: PGP signature
_______________________________________________
infrastructure mailing list
infrastructure@xxxxxxxxxxxxxxxxxxxxxxx
http://lists.fedoraproject.org/admin/lists/infrastructure@xxxxxxxxxxxxxxxxxxxxxxx