Re: What is our technical debt?

Kevin Fenzi <kevin@xxxxxxxxx> · Fri, 26 Jun 2020 09:43:54 -0700

On Fri, Jun 26, 2020 at 10:32:14AM +0100, David Kirwan wrote:
> Hi all,
> 
> If we are moving towards openshift/kubernetes backed services, we should
> probably be sticking with containers rather than Vagrant. We can use CRC
> [1] (Code Ready Containers) or minikube [2] for most local dev work.
> 
> I'd be very much in favour of having an Infra managed Prometheus instance
> (+ grafana and alertmanager on Openshift), its something I hoped to work on
> within CPE sustaining infact.

You know, I'm not in love with that stack. It could well be that I just
haven't used it enough or know enough about it, but it seems just
needlessly complex. ;( 

I'd prefer we start out at a lower level... what are our requirements?
Then, see how we can setup something to meet those. 

Off the top of my head (I'm sure I can think of more): 

* Ability to collect/gather rsyslog output from all our machines. 
* Ability to generate reports of 'variances' from all that (ie, what odd
messages should a human look at?)
* Handle all the logs from openshift, possibly multiple clusters?
* Ability to easily drill down and look at some specifc historical logs
(ie, show me the logs for the bodhi-web pods from last week when there
was a issue). 

Perhaps prometheus/graphana/alertmanager is the solution, but there's
also tons of other open source projects out there too that we might look
into. 

kevin
--
> 
> 
> - [1] https://github.com/code-ready/crc
> - [2] https://minikube.sigs.k8s.io/docs/
> 
> 
> 
> On Fri, 26 Jun 2020 at 10:23, Luca BRUNO <lucab@xxxxxxxxxx> wrote:
> 
> > On Thu, 25 Jun 2020 15:59:44 -0700
> > Kevin Fenzi <kevin@xxxxxxxxx> wrote:
> >
> > > > What else would we want in there?
> > >
> > > Monitoring - we will likely get our nagios setup again soon just
> > > because it's mostly easy, but it's also not ideal.
> >
> > On this one (or more broadly "observability") I'd still like to see an
> > infra-managed Prometheus to internally cover and sanity-check the
> > "openshift-apps" services.
> > I remember this was on the "backlog" dashboard at Flock'19 but I don't
> > know if it got translated to an actual action item/ticket in the end.
> >
> > Ciao, Luca
> > _______________________________________________
> > infrastructure mailing list -- infrastructure@xxxxxxxxxxxxxxxxxxxxxxx
> > To unsubscribe send an email to
> > infrastructure-leave@xxxxxxxxxxxxxxxxxxxxxxx
> > Fedora Code of Conduct:
> > https://docs.fedoraproject.org/en-US/project/code-of-conduct/
> > List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
> > List Archives:
> > https://lists.fedoraproject.org/archives/list/infrastructure@xxxxxxxxxxxxxxxxxxxxxxx
> >
> 
> 
> -- 
> David Kirwan
> Software Engineer
> 
> Community Platform Engineering @ Red Hat
> 
> T: +(353) 86-8624108     IM: @dkirwan

> _______________________________________________
> infrastructure mailing list -- infrastructure@xxxxxxxxxxxxxxxxxxxxxxx
> To unsubscribe send an email to infrastructure-leave@xxxxxxxxxxxxxxxxxxxxxxx
> Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
> List Archives: https://lists.fedoraproject.org/archives/list/infrastructure@xxxxxxxxxxxxxxxxxxxxxxx

Attachment:
signature.asc

Description: PGP signature
_______________________________________________
infrastructure mailing list -- infrastructure@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to infrastructure-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/infrastructure@xxxxxxxxxxxxxxxxxxxxxxx