Re: Deploying JS-based application (Hubs)

Stephen John Smoogen <smooge@xxxxxxxxx> · Fri, 5 Jan 2018 13:32:00 -0500

On 5 January 2018 at 05:36, Pierre-Yves Chibon <pingou@xxxxxxxxxxxx> wrote:
> Good Morning Everyone,
>
> There has been work on fedora-hubs for a while now and there is an objective to
> make it live in staging early this year.
> However, there is a question about how we want to deploy such application.
>
> So far we have asked that all our application be packaged in RPMs. The main
> application may not be in the official Fedora repositories but (for most) we
> asked that all of its dependencies are.
> For example, pkgdb or pagure aren't in Fedora's repositories themselves, but we
> still build them in koji pulling the dependencies from the official repos.
>
> Hubs is the second of our app where this model is almost not workable because it
> is written in nodejs where every file is/can be a separate package and semantic
> versioning sometime not very well respected.
>

My short reading on semantic versioning is that this is some sort of
Romantic ideal of how the universe should work if we knew everything
and of course nothing actually works that way because of course my
breaking change is your minor upgrade. So I can understand it not
being respected.

> The other application we have that is in nodejs is the flock registration
> application that, iirc, we run in our cloud.
>
> However, hubs is not meant to be run in our cloud.
>
> So how do we want to deploy hub?
> Do we allow npm install? Do we want to use container? Should it target
> openshift?
> How do we want to handle updates? (especially considering the semantic
> versioning aspect mentioned above)
>
>

I am going to back this up a bit and cover things we all understand,
but may want to revisit step by step to see if we have different
understandings. [AKA get rid of assumptions as Dennis Gilmore said on
IRC recently.]

Surprise is the opposite of engagement.
  A. The software on each node must be the same at time of running.
     a. This is to make sure that user does not get one version if DNS
        says proxy13 for half a visit and proxy14 for another.
  B. The software on each node must be able to replicated through
     simple steps.
     a. This makes sure that someone else outside of Fedora can
duplicate what we have.
     b. And that we can rebuild a box at 2 am with no sleep and not
have a node which violates A.
  C. The software needs to be upgradeable with known steps. The
     reasons are similar to B.
  D. The software needs to be buildable with known steps. The reasons
     are similar to B.
  E. The software needs to be 'open' in a way that does not require
     special logins or 'secret' repositories.
     a. This is mainly to ensure that if we have a meltdown that the
software can be set up without needing that special login or
repository to get.
  F. The software should not be built on the box it is being run
     on. This is for several reasons
     a. Tendency for 'compiled' software to diverge during build
time. Anything from clock times to 'junk' left from a
different build can cause version on system A not act like
system B. This leads to breaking A.
     b. Old security reasons were that every system should only have
enough 'tools' to run what is installed and not build new
stuff. This stopped attackers from being able to 'compile'
rootkits locally which they would need due to architecture
differences. Current technologies have either gotten too
uniform to need local compilations or they are built around
the idea that everything is available via scripting.
     c. However, there is still a need to keep a system simple and
auditable. Trying to figure out if a 'build tool' (I include
cpan/npm/pypi as that) got the same version and didn't leave
around junk which makes another tool not work as expected is
hard. It is also hard to know if some pickle, nugget, gem is
leftover build turd or needed unit and why does it differ on
each system but work the same.

[There may be other items we are trying to meet.. but these are the
ones I can think of with a headache.]

In the past we use rpms as a method to make sure that we achieve as
much of this as possible with one tool. We have an 'archived' via rpm
an immutable version (meets A and B and F) which was built from source
(which meets C and D and E). In looking at other tools we should look
at how we can make them fit this mold best.

So options.

1. Use rpm as the container. Just bundle it all together into one RPM
   and plop that onto the servers. This is basically what I used to
   have to do with commercial Java software long ago.. its ugly in the
   build side and ugly in the running side but it makes auditing easy.
2. Use dockah as the container. Plus side is that we can just deploy
   it like we do the mirrorlist everywhere.. Downside is that we are
   running F25 mirrorlist 1 month after EOL of F25 with it being a
   '1-2 person knows how to build the next version' and they have
   everything else on their 80 hour week.
3. Built it, tar it up, plop the tar ball on all the boxes that need
   it. [AKA it was good enough for Enterprise software from 1965->1995
   its good enough for us now.] Auditability is lower but you can
   still make checksums of every file to see if someone messed with a
   server.
4. Set up our own npm repositories that we use for getting the
   software we want installed. This means that the software gets built
   using the tools it wants and we control the versions of the
   software it can get.
5. Screw auditability.. every box makes its node when it is built via
   npm and other tools... if box A doesn't match B.. just keep
   rebuilding them all until they get close enough. Most of the time
   this will work without a problem because we have designed most
   software to be mostly reproducible and as quickly as toolkits
   update they rarely do so in the middle of a build.
6. Don't deploy software like this. I am putting this in for
   completeness. This was the default answer for many years but has
   made for a lot of software not able to be used by us. There are
   times where this is still the right answer because we already have
   a lot of software which we are barely maintaining.
7. Some combination of parts of the above.

I think we are going to need to look at 7. That said there are a lot
of things that need to be detailed. What resources does hubs tie into?
What servers does it need to be near against, What is its data backing
store? Who is writing and fixing bugs in this? That will help figure
out the numbers I forgot and the options.

I don't think we want to have any box in production doing npm installs
anymore than we want them doing pypi installs. How possible is it that
we can set up our own node repositories, build a container using them,
and then deploy that via docker on some systems?

> What do people think?
>
> Thanks,
> Pierre
> _______________________________________________
> infrastructure mailing list -- infrastructure@xxxxxxxxxxxxxxxxxxxxxxx
> To unsubscribe send an email to infrastructure-leave@xxxxxxxxxxxxxxxxxxxxxxx

-- 
Stephen J Smoogen.
_______________________________________________
infrastructure mailing list -- infrastructure@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to infrastructure-leave@xxxxxxxxxxxxxxxxxxxxxxx