Re: dashboard in mimic

Lenz Grimmer <lgrimmer@xxxxxxxx> · Sat, 23 Dec 2017 00:56:24 +0100

Hi John,

On 12/20/2017 11:55 AM, John Spray wrote:

> There have been some discussions about making big additions to the 
> dashboard module (the web gui that runs in ceph-mgr), so as a couple 
> of people have suggested, let's have a mailing list thread about it!

Thanks a lot for kicking this off! Please find some comments inline, I'd
be glad to discuss this in more depth after the holiday break.

> This is a bit wordy so I've written it more like a document than an 
> email, see below.  It's a very broad topic, so what I've written
> here is far from complete.  We're still at the point of discussion,
> there's no UI code being written so far for any of the stuff that I
> mention below.

For additional context, I think it makes sense to mention that
this topic was also discussed in the last CDM call:

https://youtu.be/YNfp_4S7mYE?t=28m37s

The collection of ideas that Sage mentioned during the call have been
noted down here:

http://pad.ceph.com/p/mimic-dashboard

Looking at that list, I think we've implemented several of that items in
openATTIC/DeepSea already (https://www.openattic.org/features.html) and
many of the other topics are on our TODO as well.

As Jan already mentioned during the call, a good first step for us could
be to contribute the Grafana dashboards that we developed and embed in
openATTIC. They are currently maintained in the DeepSea git (somewhat
hidden at
https://github.com/SUSE/DeepSea/tree/master/srv/salt/ceph/monitoring/grafana/files),
but I think it would make sense to incorporate them upstream instead, or
maintain them as a separate project.

They have been developed to display data collected by the DigitalOcean
Ceph Exporter for Prometheus
(https://github.com/digitalocean/ceph_exporter). We also created a RGW
metrics exporter for the RGW dashboard parts:
https://github.com/SUSE/DeepSea/tree/master/srv/salt/ceph/monitoring/prometheus/exporters

The embedding of Grafana dashboards into another web app is actually not
that trivial (a simply iframe is way too inflexible) - we ended up with
writing a small proxy for the oA backend that talks to Grafana and then
forwards the filtered output to the oA web UI. You can see some examples
at https://www.openattic.org/galleries/oa-3.x-screenshots/

It should be relatively straightforward to port that to the manager
dashboard.

> What? =====
> 
> Extend the dashboard module to provide management of the cluster, in 
> addition to monitoring.  This would potentially include anything you 
> can currently do with the Ceph CLI, plus additional functionality
> like calling out to a container framework to spawn additional
> daemons.
> 
> The idea is to wrap things up into friendlier higher-level
> operations, rather than just having buttons for the existing CLI
> operations. Example workflows of interest: - a CephFS page where you
> can click "New Filesystem", and the pools and MDS daemons will all be
> created for you. - similarly for RGW: ability to enable RGW and
> control the number of gateway daemons - driving OSD
> additional/retirement, and also format conversions (e.g.
> filestore->bluestore)

OSD lifecycle management is definitely a frequently occurring task that
would benefit from an easy UI.

I'd focus on addressing the most popular and regular admin chores first
before diving into adding one-off management/deployment features.

> Some of the functionality would depend on how Ceph is being run: 
> especially, anything that detects devices and starts/stops physical 
> services would depend on an environment that provides that (such as 
> Kubenetes).

Right, this part could become be quite complex, as there are multiple
methods for deploying and orchestrating Ceph: bare-metal vs. Kubernetes,
using tools like ceph-ansible vs. DeepSea/Salt...

It may make sense to start with adding management functionality that is
based on existing/built-in Ceph APIs, e.g. Pools/RBDs/RGW and CephFS,
starting with read-only methods for obtaining information about these
and then extending that code path incrementally by adding functionality
to modify these objects. This evolutionary approach served us well for
many oA features that we created.

But at some point you will have to reach out to external services and
orchestration tools.

> Why build it in? ============
> 
> Historically, Ceph management UIs were usually doing lots of
> non-Ceph work too, configuring the underlying OS and hardware as well
> as the Ceph cluster itself.  Consequently, it often made sense build
> the user interface into an external tool/framework that already knew
> how to do all that labour-intensive infrastructure stuff, rather than
> trying to reinvent it for a Ceph-specific management tool.

We came to the same conclusion and initially started off from the
assumption, that the Ceph Cluster is already deployed and up and running
and our tool can then take it from there.

Of course, everybody wants that GUI-based "one click" install, but it's
the most complicated part to get right, and a lot of effort. Considering
you only use it once in the life cycle of your cluster, we currently
tried focusing on the more frequently occurring tasks...

> As some of us are moving towards running Ceph in container 
> environments like Kubernetes, the hardware/OS piece is increasingly 
> taken care of for us.  The container platform provides a simpler way 
> to discover and use hosts and block devices, which we can use
> directly from Ceph (or from the ceph dashboard).

The key is to make the dashboard usable in as many environments as
possible, even if only with limited functionality.

One thought however: the current UI framework is likely not well suited
for developing functionality that requires user interaction and some
more sophisticated widgets and other UI elements.

While I think that CherryPy is a great choice for the backend
functionality, AngularJS might be a better choice than Rivets.js for the
frontend in the long run. We've had very good experiences with it and
are currently in the process of migrating our UI to Angular2. But this
of course complicates the build and testing process.

> What about external UIs? ====================
> 
> Building more UI functionality into Ceph should not get in the way
> of integrating with any external tools/projects.  It should actually 
> benefit those projects: as we connect up functionality into the 
> dashboard module, those same ceph-mgr/python code paths can easily
> be connected to REST endpoints in the restful module.

That would be really useful indeed.

> The work to actually expose the REST bits will probably still fall
> on the people who really want/need that functionality, but it should
> be a very lightweight task for things where the functionality
> already exists in the dashboard.

So the Dashboard won't use the REST API itself by default? Wouldn't it
be better to have a clear separation between the UI and backend here,
and using one common API?

> Currently modules are somewhat isolated from one another, but I've 
> recently added an inter-module RPC interface so that we can have 
> better sharing of state -- the idea is to have some common things
> like a table of long-running-jobs that would be shared between the 
> dashboard and restful modules.

Have you already started working on this part? We created a TaskQueue
implementation for oA that might be worthwhile using here.

> Security ======
> 
> The dashboard is currently completely read-only: that's convenient 
> because it makes it less scary to run it over unencrypted http
> and/or without login (or in practice, leaving https/login as an
> exercise to the sysadmin).  When administrative functionality is
> added, we'll need some sort of login, and https too.

Agreed, access control will be required as soon as you will be able to
actually modify things.

> The https part can probably be done in the same way as the restful 
> module: require a user-generated certificate (i.e. for their proper 
> domain) by default, but also provide a helper for the adventurous
> user to run with a self-signed cert if they want to.

Sounds good.

> The login part could be as simple as creating users/passwords using
> a CLI and just prompting for them in the GUI, or we could also have
> some GUI functionality for managing users.  I wouldn't want to go too
> far with the latter: if someone has complex requirements then it's 
> generally better to be plugging into some external user database.

Agreed - external auth using SAML/oAUTH/LDAP/AD is usually high on the
wishlist for "enterprise" users. But it seems like CherryPy does not
provide any support for these methods yet?

> It would still be very nice to retain the read only mode as an option
> of course.

Being able to flag a user as "read-only" might be good enough to begin
with, instead of devising a full-fledged role/privilege system.

Thanks for kicking this off! I think our work on openATTIC and the
experiences that we've gathered while doing so might be useful here, so
we should continue this conversation.

Lenz

-- 
SUSE Linux GmbH - Maxfeldstr. 5 - 90409 Nuernberg (Germany)
GF:Felix Imendörffer,Jane Smithard,Graham Norton,HRB 21284 (AG Nürnberg)

Attachment:
signature.asc

Description: OpenPGP digital signature