Keynote: What's Planned for Ceph Octopus - Sage Weil -> Feedback on cephs Usability

Owen Synge <osynge@xxxxxxxxxxxxxx> · Sat, 25 May 2019 04:13:23 +0200

Dear Ceph team,

I have been watching Sages 5 these for octopus, and a love the themes, 
and all of Sages talk.

Sages talk mentioned cluster usability.

On 'the orchestrate API' slide, sage slides talk about a "Partial 
consensus to focus efforts on":

(Option) Rook (which I don't know, but depends on Kubernetes)

(Option) ssh (or maybe some rpc mechanism).

I was sad not to see the option

(Option) Support a common the most popular declarative 
puppet/chef/cfengine module.

I think option (ssh) exists only because work has been invested in 
complex salt and ansible implementations, but that never seem to reduce 
in complexity. I propose we chalk it down to mistakes we made and gain 
some wisdom why option (ssh) took much more effort than expected, and 
learn from Option (Rook).

I think option (Rook) is a very good idea, as it works on sounds ideas I 
have seen work before.

I understand that ceph should not *only* depend on anything as complex 
as Kubernetes as a deployment dependency, even if it is the best 
solution. I may not want to run some thing as complex as Kubernetes just 
to run ceph.

I would have liked to see on the slides:

(Option) Look how to get Rook's benefits without Kubernetes

The rest of the email explains how I think ceph should best be 
configured without Kubernetes in a ceph like way.

I believe Rook's dependency Kubernetes, provides an architecture based 
on a declarative configuration and shared service state makes managing 
clusters easier. In other words Kubernetes is like service version of 
cephs crushmap which describes how data is distributed in ceph.

To implement (Names can be changed and are purely for illustration)
"orchestratemapfile" -> desired deployment configfile
    'orchestratemap' -> compiled with local state orchestratemapfile

    'liborchestrate' -> shares and executes orchestratemap

So any ceph developer can understand, just like the crushmap is 
declarative and drives data, The "orchestratemap" should be declarative 
and drive the deployment. The crushmap is shared state across the 
cluster, the orchestratemap would be a shared state across the cluster. 
A crushmap is a compiled crushmapfile with state about the cluster. A 
orchestratemap is compiled from a orchestratemapfile with state about 
the cluster.

Just like librados can read a crushmap and speak to a mon to get cluster 
status, and drive data flow, liborchestrate can read a orchestratemap, 
and drive the stages of ceph deployment, A MVP* would function with 
minor degradation even without shared cluster state. (ie no orchestratemap).

A good starting point for the orchestratemapfile would be the Kubernetes 
config for rook, as this is essentially a desired state for the cluster.

If you add the current state locally into the orchestratemap when 
compiling the orchestratemapfile, All desired possible operations can be 
calculated by each node using just the orchestratemap and the current 
local state independently. All the operations that must be delayed due 
to dependencies in other operations can also be calculated for each 
node, this avoids, retry, timeouts, and instantly reduces error handling 
and allows for ceph to potentially, save the user from knowing that more 
than one deamon is running to provide ceph, staged upgrades,practice 
self healing at the service level, guide the users deployment with more 
helpful error messages, and many other potential enhancements.

It may be argued that Option (ssh) is simpler than implementing an 
"orchestratemap" and liborchestrate that reads it, and I argue Option 
(ssh) is simpler for a test grade MVP, but for a production grade MVP 
solution I suspect implementing an "orchestratemap" and liborchestrate 
is simpler due to simpler synchronization, planning and error handling 
for management of ceph, just like the crushmap simplifies 
synchronization, planning and error handling for data in ceph.

Good luck and have fun,

Owen Synge

* I once nearly finished an orchestratemapfile to ceph configuration 
once (no shared cluster state), and the bulk of the work was 
understanding how each ceph daemon interact with the cluster during 
boot, and commands to manage the demon. Only the state serialization, 
comparison and propagation where never completed.