Re: Rook orchestrator module

Travis Nielsen <tnielsen@xxxxxxxxxx> · Tue, 29 Sep 2020 15:05:45 -0600

Adding reply-all this time...

On Tue, Sep 29, 2020 at 2:53 PM Jason Dillaman <jdillama@xxxxxxxxxx> wrote:
>
> On Tue, Sep 29, 2020 at 4:47 PM Travis Nielsen <tnielsen@xxxxxxxxxx> wrote:
> >
> > On Tue, Sep 29, 2020 at 1:50 PM Jason Dillaman <jdillama@xxxxxxxxxx> wrote:
> > >
> > > On Tue, Sep 29, 2020 at 3:33 PM Travis Nielsen <tnielsen@xxxxxxxxxx> wrote:
> > > >
> > > > Sebastian and fellow orchestrators,
> > > >
> > > > Some questions have come up recently about issues in the Rook
> > > > orchestrator module and its state of disrepair. Patrick, Varsha, and I
> > > > have been discussing these recently as Varsha has been working on the
> > > > module. Before we fix all the issues that are being found, I want to
> > > > start a higher level conversation. I’ll join the leads meeting
> > > > tomorrow to discuss, and would be good to include in the Monday
> > > > orchestrator agenda as well, which unfortunately I haven’t been able
> > > > to attend recently...
> > > >
> > > > First, Rook is driven by the K8s APIs, including CRDs, an operator,
> > > > the CSI driver, etc. When the admin needs to configure the Ceph
> > > > cluster, they create the CRDs and other resources directly with the
> > > > K8s tools such as kubectl. Rook does everything with K8s patterns so
> > > > that the admin doesn’t need to leave their standard administration
> > > > sandbox in order to configure Rook or Ceph. If any Ceph-specific
> > > > command needs to be run, the rook toolbox can be used. However, we
> > > > prefer to avoid the toolbox for common scenarios that should have CRDs
> > > > for declaring desired state.
> > > >
> > > > The fundamental question then is, **what scenarios require the Rook
> > > > orchestrator mgr module**? The module is not enabled by default in
> > > > Rook clusters and I am not aware of upstream users consuming it.
> > > >
> > > > The purpose of the orchestrator module was originally to provide a
> > > > common entry point either for Ceph CLI tools or the dashboard. This
> > > > would provide the constant interface to work with both Rook or cephadm
> > > > clusters. Patrick pointed out that the dashboard isn’t really a
> > > > scenario anymore for the orchestrator module.
> > >
> > > Is that true? [1]
> >
> > Perhaps I misunderstood. If the dashboard is still a requirement, the
> > requirements will certainly be much higher to maintain support.
> >
> > >
> > > > If so, the only
> > > > remaining usage is for CLI tools. And if we only have the CLI
> > > > scenario, this means that the CLI commands would be run from the
> > > > toolbox. But we are trying to avoid the toolbox. We should be putting
> > > > our effort into the CRDs, CSI driver, etc.
> > > >
> > > > If the orchestrator module is creating CRs, we are likely doing
> > > > something wrong. We expect the cluster admin to create CRs.
> > > >
> > > > Thus, I’d like to understand the scenarios where the rook orchestrator
> > > > module is needed. If there isn’t a need anymore since dashboard
> > > > requirements have changed, I’d propose the module can be removed.
> > >
> > > I don't have a current stake in the outcome, but I could foresee the
> > > future need/desire for letting the Ceph cluster itself spin up
> > > resources on-demand in k8s via Rook. Let's say that I want to convert
> > > an XFS on RBD image to CephFS, the MGR could instruct the orchestrator
> > > to kick off a job to translate between the two formats. I'd imagine
> > > the same could be argued for on-demand NFS/SMB gateways or anywhere
> > > else there is a delta between a storage administrator setting up the
> > > basic Ceph system and Ceph attempting to self-regulate/optimize.
> >
> > If Ceph needs to self regulate, I could certainly see the module as
> > useful, such as auto-scaling the daemons when load is high. But at the
> > same time, the operator could watch for Ceph events, metrics, or other
> > indicators and perform the self-regulation according to the CR
> > settings, instead of it happening inside the mgr module.
>
> But then you would be embedding low-level business logic about Ceph
> inside Rook? Or if you are saying Rook would wait for a special event
> / alert hook from Ceph to perform some action. If that's the case, it
> sounds a lot like what the orchestrator purports to do (at least to me
> and at least as an end-state goal).

Agreed we don't want to embed Ceph logic in Rook. But yes, if Rook can
have a hook into Ceph to perform the action, the operator could handle
it. Then if cephadm needed to handle the same scenario, it might use a
mgr module to implement. But no need for a rook module in that case.

>
> > At the end of the day, I want to make sure we actually need an
> > orchestrator interface. K8s and cephadm are very different
> > environments and their features probably won't ever be at parity with
> > each other. It may be more appropriate to define the rook and cephadm
> > module separately. Or at least we need to be very clear why we need
> > the common interface, that it's tested, and supported.
>
> Not going to disagree with that last point.
>
> > >
> > > > Thanks,
> > > > Travis
> > > > Rook
> > > >
> > >
> > > [1] https://tracker.ceph.com/issues/46756
> > >
> > > --
> > > Jason
> > >
> >
>
>
> --
> Jason
>