Re: blinking lights

John Spray <jspray@xxxxxxxxxx> · Wed, 24 Oct 2018 19:03:45 +0100

On Wed, Oct 24, 2018 at 6:46 PM Lars Marowsky-Bree <lmb@xxxxxxxx> wrote:
>
> On 2018-10-24T12:37:50, Sebastian Wagner <swagner@xxxxxxx> wrote:
>
> > >> 3) Delegate this to the new orchestrator.  Kube can just run this command
> > >> wherever we want.  Ansible presumably can too.
> > >
> > > I agree with Kai and Jan that this is the way to go.
> > >
> > > Using the libstoragemgmt network service is still possible, it would
> > > just be an implementation detail for the orchestrator itself.  I can
> > > imagine that in some future container environments, deploying
> > > something like the libstoragemgmt network service becomes quite
> > > cheap/easy, and saves the effort of tools like Rook implementing their
> > > own agent hooks -- but Ceph won't care.
> > >
> > > This would be an area where we need to get the orchestrator's device
> > > names in line with Ceph's internal device naming -- that would be
> > > useful anyway for other orchestrator functionality.
> >
> > Yes. As the OSD may no longer be running, and thus out of reach for
> > Ceph, we should to use the orchestrator for that.
>
> I feel stupid for agreeing with this yet again, but also exactly because
> in a containerized world (where maybe only the specific LV is exposed to
> the pod) ceph-osd might not be able to even.
>
> And finally, we may want to be able to blink which disk we are *about to
> provision*, or which slot to plug the disk into, so in ceph-volume setup
> stages. Anything that relies on the ceph-osd process is a bit flawed.
>
> And the orchestrator has all the access privileges because it needs them
> anyway.
>
>
> > > I'd be inclined to just make the command synchronous, and return an
> > > error if the host is unreachable (perhaps with a special force flag to
> > > clear out Ceph's state if the host is gone and never coming back).
>
> Actually this should probably be build with the extension to the whole
> node in mind! In case of a total node failure in a 500 node DC, this
> might need to be blinked just as well.

Do you mean blinking the node when it's running but none of its OSDs
are, or blinking a fully offline node using lights-off management
layers?

John

>
>
>
>
>
> Regards,
>     Lars
>
> --
> SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
> "Architects should open possibilities and not determine everything." (Ueli Zbinden)