Re: Ceph and management tools interaction proposal

John Spray <jspray@xxxxxxxxxx> · Mon, 19 Feb 2018 17:32:18 +0000

On Mon, Feb 19, 2018 at 8:35 AM, Ricardo Dias <rdias@xxxxxxxx> wrote:
> Hi,
>
> I would like to start a discussion about the interaction between Ceph
> and the several management/deployment tools (MTs from herein) like
> kubernetes, salt, or ansible. (I couldn't find such discussion in the
> ceph-devel mailing list, and therefore I'm sorry if this was something
> that was already discussed).
>
> Since this project started looking at containerized environments by
> using a kubernetes/rook based solution for deploying and managing a ceph
> cluster, I believe that one of the main ideas is to make the ceph-mgr
> be able to "talk" with k8s/rook in order to give Ceph the power of
> controlling its own services, like adding/removing OSDs and other
> similar kinds of operations.
>
>                         +----------------------+
>                         |  Cluster             |
> +----------+            |                      |
> |   k8s    |            |  +----------+        |
> |          |<-----------+--|   Ceph   |        |
> |          |            |  +----------+        |
> |   rook   |----------->|                      |
> +----------+            |  +----------------+  |
>                         |  | other services |  |
>                         |  +----------------+  |
>                         +----------------------+
>
> When I first saw this architecture proposal (in a presentation made by
> Blaine Gardner) the first thing I noticed was that this architecture,
> could be generalized not only to use k8s/rook, but to also use other
> MTs. For instance, instead of using the pair k8s/rook, we could use
> salt/deepsea, or ansible/ceph-ansible to perform the same
> deployment/management tasks.
>
>                                  +----------------------+
>                                  |  Cluster             |
> +-------------------+            |                      |
> |  k8s/salt/ansible |            |  +----------+        |
> |                   |<-----------+--| Ceph     |        |
> |                   |            |  +----------+        |
> | rook/deepsea/     |----------->|                      |
> |    ceph-ansible   |            |                      |
> +-------------------+            |  +----------------+  |
>                                  |  | other services |  |
>                                  |  +----------------+  |
>                                  +----------------------+
>
> Until now, we have been using MTs in one direction only, the user uses an
> MT to deploy/manage Ceph. But since the introduction of ceph-mgr, and
> more recently with the development of a WebUI dashboard, having the
> capability of performing/controlling deployment/management operations
> from ceph-mgr, or any module running inside the ceph-mgr, requires that
> Ceph can communicate with these MTs.
>
> This communication between Ceph and MTs can be achieved using two
> approaches:
>
> a) MTs provide their own API (different for each kind of MT) and Ceph
> implements the support to talk to each one of the APIs;
>
> b) Ceph designs/controls an API schema (let's call it Ceph-Mgt API) that
> allows Ceph to perform deployment/management tasks, and each MT provide
> an implementation of that API schema.

This is a very relevant topic, thank you for starting the discussion.

> IMO approach b) would be the way to go for several reasons:
>
> * Ceph would not be dependent on any MT kind/version/functionality
> (either the MT provides the Ceph-Mgt API implementation or not)
> * Development of the high-level deployment/management operations would
> be faster/reliable because only one API exists
> * Deployment heuristics (already implemented in MTs, like automatic
> allocation of OSDs to disks for instance) could actually be implemented
> inside the Ceph code base and become centralized.
>
> The cons I see for approach b) are:
> * it's hard to design such API schema beforehand (but that is why I'm
> starting this discussion)
> * if none MT provides an implementation for the API, ceph-mgr will not
> be able to do anything (this con is highly unlikely to happen, since we can
> always contribute to the MT project, as we already do)
>
> Now some concrete aspects of approach b):
>
> Usually MTs already provide a non-pure (for purists) REST interface that
> can be extended. I know that salt has this (this is what is used for
> communication between openATTIC and DeepSea), k8s also has, and I'm not
> aware of such thing for ansible, but some simple webserver could be
> placed in front of ansible to provide the same kind of REST interface.

We definitely do need to define ways to plug new functionality into
diverse backends (i.e. things other than k8s), but I'm not sure the
(b) remote API is the right place to do it, vs. the (a) piece of local
code in Ceph.

Having a common REST API is a neat idea *if* you can get all the
platforms of interest to implement it identically.  However, I'm
really doubtful about that assumption:
 - Even if e.g. Rook and DeepSea both had the same logical API, the
details would be different (Salt/k8s API frameworks presumably have
different auth, different conventions on pagination, etc)
 - When a tool like Ansible Tower has a REST API, that does not mean
that playbooks can expose an arbitrary API: whatever drives the Tower
API has to understand that it is operating in terms of playbooks etc.
 - If we had a "fallback" backend that just SSH'd out to baremetal
hosts to do the basics, it would need a whole API service built around
it.

If I'm right about that point, and getting identical APIs across tools
is unrealistic, then you end up with intermediary services everywhere.
I don't think we can reasonably suggest to a user that they should
have Rook, Ceph, *and* a third service that connects Rook and Ceph.
The reasonable thing to do in that situation is rather to have code
inside Ceph that knows how to talk to Rook -- that's the (a) option
rather than the (b) option.

The situation today is that the tools already do have their own APIs
(we're halfway to the (a) solution already), so the question is
whether the code for driving those MT APIs in a uniform way should
live inside Ceph, or whether it should be in another service with a
uniform remote API layer on top of it.  In that context, I strongly
prefer not to add another remote API layer.

To be clear, I am very much in favour of having an abstraction layer:
I just think the abstraction layer would be best implemented on the
Ceph side, as some python code, rather than as a collection of
externally run proxy/translator services.  If Rook and DeepSea end up
with extremely similar APIs to the point that they can be converged,
then that would be awesome, and I'm happy to be proven wrong.

> MTs could register in the ceph manager service map with the URL that
> provides the Ceph-Mgt API implementation. Then any ceph-mgr module could
> check the availability of such implementation, and use it.
>
> The Ceph-Mgt API should enforce a high-level abstract representation of
> the cluster, in order to work in containerized and bare-metal
> environments.

I see two possible approaches here:
 - build an interface designed for containerized environments, and
enable baremetal environments to do their best to implement it
 - or, have an interface with some container-only bits and some
baremetal-only bits.

My default is for the first approach, but I'm sympathetic that some
people will have functionality in non-container backends that they
want to expose, so that would lead to the second.

> For instance, the Ceph-Mgt API should specify methods for list the
> available resources of a cluster, where by resources I mean
> computational devices (hosts), disks, networks. Each resource may have
> several properties like the amount of RAM a computational device has, or
> the size of a disk, etc. The resources could be related to each other, a
> set of disk resources can be related to a single computational device.
>
> The Ceph-Mgt API could also specify the notion of service, where a
> service could be an OSD/MON/MGR daemon, an NFS-ganesha daemon, or any
> other software component that runs inside the cluster.
> The API should provide methods to start/configure/stop services, and
> these services should be associated with the cluster resources described
> in the previous paragraph.

The individual service level may not be the right abstraction for
driving the stateless daemons: perhaps something more like a
kubernetes Deployment object that requests a certain number of
services (pods) running with the same configuration.

> Another important feature of the Ceph-Mgt API would be to provide a
> stream based connection between the MT and Ceph so that Ceph could
> listen for events from the MT, and also to support asynchronous
> operations in a more efficient way.

We do need this stream of notifications from the MT to Ceph (my
favourite simple example is knowing when a k8s MDS pod fails, so that
I can call "ceph mds fail" on the rank).  This would be an area where
using existing remote APIs is nice, because k8s already has its
"watch" API.

> I think with this approach the techniques used for the deployment and
> management of Ceph would be mostly under the Ceph ecosystem/project and
> can avoid deviations that currently exist in each MT project.
>
> What do you all think about this?

The most important part currently is defining what this set of common
operations actually is.  I hope to circulate some prototype
Rook-driving code soon, which would be a good opportunity to discuss
which bits would map to other backends and how.  Ultimately, if we
have some in-Ceph python code that we later decide to kick out into
remote services with remote APIs, that wouldn't be too bad of a
situation.

Cheers,
John

> Thanks,
> --
> Ricardo Dias
> Senior Software Engineer - Storage Team
> SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton,
> HRB 21284
> (AG Nürnberg)
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html