Re: monitoring

Sage Weil <sage@xxxxxxxxxxxx> · Wed, 4 Dec 2019 00:12:27 +0000 (UTC)

On Wed, 4 Dec 2019, Paul Cuzner wrote:
> Interesting discussion - but I don't want to lose sight of the original
> questions
> 
> ceph-deamon make several deployment decisions at the moment that differs
> from existing deployment patterns. This is the first point that I wanted to
> raise.
> - it assumes that from Octopus onwards, the only deployment pattern we
> provide is container only.
> - it places all of Ceph's files (config and data) within /var/lib. In the
> past even with containers, we've still used /etc for config to align with
> FHS and, since the OS is package based, config from other packages adheries
> to FHS anyway - which makes Ceph different.
> - it uses fsid in path names and container names, just in case users want
> to run multiple ceph clusters on the same machine. IMO this adds
> complication to 100% of deployments, that may benefit 5% of the user base
> (numbers plucked out of the air on that one!)
> 
> Perhaps, all of these design points trace back to a single idea - support
> multiple ceph clusters on the same set of machine(s). Is this the goal? Is
> this want Ceph users want?

It was one of my goals.  A few reasons:

- It's easy and clean.
- These users do exist.
- When we deprecated this behaviour before, our justification was "you 
should be using containers".  Well, here we are.
- Rook allows this (with the (current) caveat that you can't put mons from 
multiple clusters on the same host/IP if they're using default 
ports).
- The paths for rook are also convoluted like this, nested under 
the kubernetes namespace name.
- The pain of weird paths is mitigated when you use enter a containers (or 
use the shell container).
- tab-completion works both for path names and systemd service names.  
Also, ceph-daemon shell and similar commands will figure out the fsid 
themselves when there is a single cluster on the host.

> Now picking up on the scope issue for the orchestrator - apologies if this
> sounds like a manifesto...I'm a "usability" addict!
> 
> IMO, our collective goal should be to drive ease of use and Ceph adoption
> beyond Linux geeks. If that's a view that resonates, I think the
> orchestrator has critical role to play to enable that strategy
> Personally I'd would like to see the orchestrator evolve over time to
> become the automation engine that enables an open source ecosystem around
> Ceph;
> - provide a default implementation for monitoring/alerting/metrics - this
> can be simple and doesn't need HA - as Sage has already mentioned
> - samba/ganesha deployment, loud balancers to improve radosgw etc etc
> - integration with platform management (why not show in the ceph dashboard
> whether you have patches outstanding against your host, or the host has a
> bad PSU) - enable the sysadmin to work more efficiently on Ceph, and maybe
> they'd prefer it over other platforms.
> 
> We absolutely still need to support DIY configurations - but having a
> strategy that delivers a better out-of-the-box Ceph experience is surely
> our goal.
> 
> </soapbox>

+1

sage

> 
> 
> 
> 
> On Fri, Nov 29, 2019 at 8:22 PM Jan Fajerski <jfajerski@xxxxxxxx> wrote:
> 
> > On Thu, Nov 28, 2019 at 02:26:36PM +0000, Sage Weil wrote:
> > --snip--
> > >> >I think it makes sense to focus on the out-of-the-box opinionated easy
> > >> >scenario vs the DIY case, in general at least.  But I have a few
> > >> >questions...
> > >> I think this focus will leave some users in the dust. Monitoring with
> > prometheus
> > >> can get complex, especially if it is to be fault tolerant (which imho is
> > >> important for confidence in such a system). Also typically users don't
> > want
> > >> several monitoring systems in their environment. So let's keep the case
> > of
> > >> existing prometheus systems in mind please.
> > >
> > >That's what I want meant by 'vs' above... perhaps I should have said 'or'.
> > >Either we deploy something simple and opinionated, or the user attaches to
> > >their existing or self-configured setup.  We don't probably need to worry
> > >about the various points in the middle ground where we manage only part of
> > >the metrics solution.
> >
> > I'm not sure we'll get off this easy. At the very least the prometheus mgr
> > module is deployed by us. There is also an argument to be made for
> > monitoring
> > the things that we take control over, i.e. the containers we deploy (one
> > node_exporter per container is a common setup) and maybe even the hosts
> > that the
> > orchestrator provisions.
> >
> > >
> > >(Also, I'm trying to use 'metrics' to mean prometheus etc, vs 'monitoring'
> > >which in my mind is nagios or pagerduty or whatever and presumably has a
> > >level of HA required, and/or needs to be external instead of baked-in.)
> >
> > Not sure I understand that distinction. You mean metrics for the
> > prometheus
> > setup the orchestrator intents to install? (prometheus can certainly be a
> > fully
> > fledged monitoring stack).
> >
> > Jan
> > >
> > >sage
> > >
> > >
> > >> >
> > >> >- In the DIY case, does it makes sense to leave the node-exporter to
> > the
> > >> >reader too?  Or might it make sense for us to help deploy the
> > >> >node-exporter, but they run the external/existing prometheus instance?
> > >> >
> > >> >- Likewise, the alertmanager is going to have a bunch of ceph-specific
> > >> >alerts configured, right?  Might they want their own prom but we deploy
> > >> >our alerts?  (Is there any dependency in the dashboard on a particular
> > set
> > >> >of alerts in prometheus?)
> > >> >
> > >> >I'm guessing you think no in both these cases...
> > >>
> > >> What I'm missing from proposals I've seen so far is an interface to
> > query the
> > >> orchestrator for various prometheus bits. First and foremost the
> > orchestrator
> > >> should have a command that returns a prometheus file_sd_config of
> > exporters that
> > >> an external prometheus stack should scrape. Whether this is just the mgr
> > >> exporter or also node_exporters (or others) depends on how far the
> > orchestrator
> > >> will take control.
> > >> Alerts are currently handled as an rpm but could certainly be provided
> > through a
> > >> similar interface.
> > >>
> > >> At the very least, if the consensus will be that the orchestrator
> > absolutely has
> > >> to deploy everything itself, please at least provide an interface so
> > that a
> > >> federated setup is easily possible (an external prometheus scraping the
> > >> orch-deployed prometheus) so that users don't have to care what the
> > orchestrator
> > >> does with monitoring (other then duplicating recorded metrics). See
> > >>
> > https://prometheus.io/docs/prometheus/latest/federation/#hierarchical-federation
> > >>
> > >> I'd really like to encourage the orchestrator team to carefully think
> > this
> > >> through. Monitoring is (at least for some users) a critical
> > infrastructure
> > >> component with its own inherent complexity. I'm worried that just doing
> > this in
> > >> a best-effort fashion and not offering an alternative path if going to
> > weaken
> > >> the ceph ecosystem.
> > >> >
> > >> >> > - Let's teach ceph-daemon how to do this, so that you do
> > 'ceph-daemon
> > >> >> > deploy --fsid ... --name prometheus.foo -i input.json'.
> > ceph-daemon
> > >> >> > has the framework for opening firewall ports etc now... just add
> > ports
> > >> >> > based on the daemon type.
> > >> >> >
> > >> >>
> > >> >> TBH, I'd keep the monitoring containers away from the ceph daemons.
> > They
> > >> >> require different parameters, config files etc so why not keep them
> > >> >> separate and keep the ceph logic clean. This also allows us to change
> > >> >> monitoring without concerns over logic changes to normal ceph daemon
> > >> >> management.
> > >> >
> > >> >Okay, but mgr/ssh is still going to be wired up to deploy these. And
> > to do
> > >> >so on a per-cluster, containerized basis... which means all of the
> > infra
> > >> >in ceph-daemon will still be useful.  It seems easiest to just add it
> > >> >there.
> > >> >
> > >> >Your points above seem to point toward simplifying the containers we
> > >> >deploy to just two containers, one that's one-per-cluster for
> > >> >prom+alertmanager+grafana, and one that's per-host for the
> > node-exporter.
> > >> >But I think making it fit in nicely with the other ceph containers
> > (e.g.,
> > >> >/var/lib/ceph/$fsid/$thing) makes sense.  Esp since we can just deploy
> > >> >these during bootstrap by default (unless some --external-prometheus is
> > >> >passed) and this all happens without the admin having to think about
> > it.
> > >> >
> > >> >> > WDYT?
> > >> >> >
> > >> >> >
> > >> >> I'm sure a lot of the above has already been discussed at length
> > with the
> > >> >> SuSE folks, so apologies for going over ground that you've already
> > covered.
> > >> >
> > >> >Not yet! :)
> > >> >
> > >> >sage
> > >> >_______________________________________________
> > >> >Dev mailing list -- dev@xxxxxxx
> > >> >To unsubscribe send an email to dev-leave@xxxxxxx
> > >>
> > >> --
> > >> Jan Fajerski
> > >> Senior Software Engineer Enterprise Storage
> > >> SUSE Software Solutions Germany GmbH
> > >> Maxfeldstr. 5, 90409 Nürnberg, Germany
> > >> (HRB 36809, AG Nürnberg)
> > >> Geschäftsführer: Felix Imendörffer
> > >> _______________________________________________
> > >> Dev mailing list -- dev@xxxxxxx
> > >> To unsubscribe send an email to dev-leave@xxxxxxx
> > >>
> > >>
> >
> >
> > --
> > Jan Fajerski
> > Senior Software Engineer Enterprise Storage
> > SUSE Software Solutions Germany GmbH
> > Maxfeldstr. 5, 90409 Nürnberg, Germany
> > (HRB 36809, AG Nürnberg)
> > Geschäftsführer: Felix Imendörffer
> > _______________________________________________
> > Dev mailing list -- dev@xxxxxxx
> > To unsubscribe send an email to dev-leave@xxxxxxx
> >
> >
> 
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx