Re: Rook orchestrator module

Patrick Donnelly <pdonnell@xxxxxxxxxx> · Wed, 7 Oct 2020 08:52:50 -0700

Adding in dev@xxxxxxx. ceph-devel is now for kernel development but
I'm keeping it in the cc list because a lot of discussion already
happened there.

Also for those interested, there's a recording of a meeting we had on
this topic here: https://www.youtube.com/watch?v=1OSQySElojg

On Tue, Sep 29, 2020 at 12:32 PM Travis Nielsen <tnielsen@xxxxxxxxxx> wrote:
>
> Sebastian and fellow orchestrators,
>
> Some questions have come up recently about issues in the Rook
> orchestrator module and its state of disrepair. Patrick, Varsha, and I
> have been discussing these recently as Varsha has been working on the
> module. Before we fix all the issues that are being found, I want to
> start a higher level conversation. I’ll join the leads meeting
> tomorrow to discuss, and would be good to include in the Monday
> orchestrator agenda as well, which unfortunately I haven’t been able
> to attend recently...
>
> First, Rook is driven by the K8s APIs, including CRDs, an operator,
> the CSI driver, etc. When the admin needs to configure the Ceph
> cluster, they create the CRDs and other resources directly with the
> K8s tools such as kubectl. Rook does everything with K8s patterns so
> that the admin doesn’t need to leave their standard administration
> sandbox in order to configure Rook or Ceph. If any Ceph-specific
> command needs to be run, the rook toolbox can be used. However, we
> prefer to avoid the toolbox for common scenarios that should have CRDs
> for declaring desired state.

We're at a crossroads here. Ceph is increasingly learning to manage
itself with a primary goal of increasing user friendliness. Awareness
of the deployment technology is key to that.

> The fundamental question then is, **what scenarios require the Rook
> orchestrator mgr module**? The module is not enabled by default in
> Rook clusters and I am not aware of upstream users consuming it.
>
> The purpose of the orchestrator module was originally to provide a
> common entry point either for Ceph CLI tools or the dashboard. This
> would provide the constant interface to work with both Rook or cephadm
> clusters. Patrick pointed out that the dashboard isn’t really a
> scenario anymore for the orchestrator module.

As Lenz pointed out in another reply, my understanding was wrong here.
Dashboard has been using the orchestrator for displaying information
from the orchestrator.

> If so, the only
> remaining usage is for CLI tools. And if we only have the CLI
> scenario, this means that the CLI commands would be run from the
> toolbox. But we are trying to avoid the toolbox. We should be putting
> our effort into the CRDs, CSI driver, etc.

I think we need to be careful about looking at the CLI as the sole
entry point for the orchestrator. The mgr modules (including the
dashboard) are increasingly using the orchestrator to do tasks. As we
discussed in the orchestrator meeting (youtube linked earlier in this
mail) CephFS is planning these scenarios for Pacific:

- mds_autoscaler plugin deploys MDS in response to file system
degradation (increased max_mds, insufficient standby). Future work [1]
will look at deploying MDS with more memory in response to load on the
file system. (Think lots of small file systems with small MDS to
start.)

- volumes plugin deploys NFS clusters configured via the `ceph nfs
...` command suite.

- cephfs-mirror daemons deployed to geo-replicate CephFS file systems.

- (Still TBD:) volumes plugin to use an rsync container to copy data
between two CephFS subvolumes (encrypted or not). Probably include RBD
mounted images as source or destination, at some point.

> If the orchestrator module is creating CRs, we are likely doing
> something wrong. We expect the cluster admin to create CRs.
>
> Thus, I’d like to understand the scenarios where the rook orchestrator
> module is needed. If there isn’t a need anymore since dashboard
> requirements have changed, I’d propose the module can be removed.

Outside of this thread I think we already decided not to do this but
I'm still interested to hear everyone's thoughts. Hopefully broader
exposure on dev@xxxxxxx will get us more voices.

[1] https://tracker.ceph.com/issues/46680

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat Sunnyvale, CA
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx