Keynote: What's Planned for Ceph Octopus - Sage Weil -> Feedback on cephs Usability

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Dear Ceph team,

I have been watching Sages 5 these for octopus, and a love the themes, and all of Sages talk.

Sages talk mentioned cluster usability.

On 'the orchestrate API' slide, sage slides talk about a "Partial consensus to focus efforts on":

(Option) Rook (which I don't know, but depends on Kubernetes)

(Option) ssh (or maybe some rpc mechanism).

I was sad not to see the option

(Option) Support a common the most popular declarative puppet/chef/cfengine module.

I think option (ssh) exists only because work has been invested in complex salt and ansible implementations, but that never seem to reduce in complexity. I propose we chalk it down to mistakes we made and gain some wisdom why option (ssh) took much more effort than expected, and learn from Option (Rook).

I think option (Rook) is a very good idea, as it works on sounds ideas I have seen work before.

I understand that ceph should not *only* depend on anything as complex as Kubernetes as a deployment dependency, even if it is the best solution. I may not want to run some thing as complex as Kubernetes just to run ceph.

I would have liked to see on the slides:

(Option) Look how to get Rook's benefits without Kubernetes

The rest of the email explains how I think ceph should best be configured without Kubernetes in a ceph like way.

I believe Rook's dependency Kubernetes, provides an architecture based on a declarative configuration and shared service state makes managing clusters easier. In other words Kubernetes is like service version of cephs crushmap which describes how data is distributed in ceph.

To implement (Names can be changed and are purely for illustration)
"orchestratemapfile" -> desired deployment configfile
    'orchestratemap' -> compiled with local state orchestratemapfile

    'liborchestrate' -> shares and executes orchestratemap

So any ceph developer can understand, just like the crushmap is declarative and drives data, The "orchestratemap" should be declarative and drive the deployment. The crushmap is shared state across the cluster, the orchestratemap would be a shared state across the cluster. A crushmap is a compiled crushmapfile with state about the cluster. A orchestratemap is compiled from a orchestratemapfile with state about the cluster.

Just like librados can read a crushmap and speak to a mon to get cluster status, and drive data flow, liborchestrate can read a orchestratemap, and drive the stages of ceph deployment, A MVP* would function with minor degradation even without shared cluster state. (ie no orchestratemap).

A good starting point for the orchestratemapfile would be the Kubernetes config for rook, as this is essentially a desired state for the cluster.

If you add the current state locally into the orchestratemap when compiling the orchestratemapfile, All desired possible operations can be calculated by each node using just the orchestratemap and the current local state independently. All the operations that must be delayed due to dependencies in other operations can also be calculated for each node, this avoids, retry, timeouts, and instantly reduces error handling and allows for ceph to potentially, save the user from knowing that more than one deamon is running to provide ceph, staged upgrades,practice self healing at the service level, guide the users deployment with more helpful error messages, and many other potential enhancements.

It may be argued that Option (ssh) is simpler than implementing an "orchestratemap" and liborchestrate that reads it, and I argue Option (ssh) is simpler for a test grade MVP, but for a production grade MVP solution I suspect implementing an "orchestratemap" and liborchestrate is simpler due to simpler synchronization, planning and error handling for management of ceph, just like the crushmap simplifies synchronization, planning and error handling for data in ceph.

Good luck and have fun,

Owen Synge


* I once nearly finished an orchestratemapfile to ceph configuration once (no shared cluster state), and the bulk of the work was understanding how each ceph daemon interact with the cluster during boot, and commands to manage the demon. Only the state serialization, comparison and propagation where never completed.




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux