Re: monitoring

Lars Marowsky-Bree <lmb@xxxxxxxx> · Wed, 4 Dec 2019 12:26:32 +0100

On 2019-12-04T00:12:27, Sage Weil <sage@xxxxxxxxxxxx> wrote:

> > Perhaps, all of these design points trace back to a single idea - support
> > multiple ceph clusters on the same set of machine(s). Is this the goal? Is
> > this want Ceph users want?
> It was one of my goals.  A few reasons:

So I agree this is useful. Glad to have this on the map again. I'm not
sure how large this percentage is - the environments that I am aware of
that'd require segregation into multiple clusters would also frown on
using the same CPU/OS instance for running them, split out disks
wouldn't be enough -, but I see it could be helpful.

On the soapbox, though:

> > Now picking up on the scope issue for the orchestrator - apologies if this
> > sounds like a manifesto...I'm a "usability" addict!
> > 
> > IMO, our collective goal should be to drive ease of use and Ceph adoption
> > beyond Linux geeks. If that's a view that resonates, I think the
> > orchestrator has critical role to play to enable that strategy
> > Personally I'd would like to see the orchestrator evolve over time to
> > become the automation engine that enables an open source ecosystem around
> > Ceph;
> > - provide a default implementation for monitoring/alerting/metrics - this
> > can be simple and doesn't need HA - as Sage has already mentioned

Doesn't it though - not being alerted when your cluster enters a degraded
state is pretty dangerous from an operational point of view. And as soon
as users start relying on those metrics, losing them is also not an
option.

We can't not deploy monitoring/alerting/metrics, since they're essential
to operating a Ceph cluster. But I think we're deluding ourselves if we
even pretend it'll remain that simple - because then design decisions
will not consider the endstate.

> > - samba/ganesha deployment, loud balancers to improve radosgw etc etc

Managing the access protocols/gateways is something we'll have to do.
And again, that means we'll have to take HA into account.

I'm a wee bit on the fence on the LB parts. We surely need to have hooks
to inform the LB layer about which endpoints we deployed, but
managing/deploying/configuring the LBs themselves I'd hope is out of
scope.

> > - integration with platform management (why not show in the ceph dashboard
> > whether you have patches outstanding against your host, or the host has a
> > bad PSU) - enable the sysadmin to work more efficiently on Ceph, and maybe
> > they'd prefer it over other platforms.

Why not? Because those are system management tasks that are outside
Ceph.

Please don't build an inferior salt/ansible/puppet/chef or systems
management console. Solutions for these problems exist. Let's not
duplicate everything.

> > </soapbox>
> +1

-1

Regards,
    Lars

-- 
SUSE Software Solutions Germany GmbH, MD: Felix Imendörffer, HRB 36809 (AG Nürnberg)
"Architects should open possibilities and not determine everything." (Ueli Zbinden)
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx