Re: blinking lights via rook

Travis Nielsen <tnielsen@xxxxxxxxxx> · Wed, 27 Feb 2019 15:50:15 -0700

On Wed, Feb 27, 2019 at 3:42 PM Sage Weil <sweil@xxxxxxxxxx> wrote:
>
> On Wed, 27 Feb 2019, Travis Nielsen wrote:
> > Some questions and comments:
> > - What is the user interaction? Is he specifying an OSD ID for which
> > he wants to blink the light or what is $PATH? If $PATH is a device
> > name such as /dev/sdb we would need to translate the OSD ID to the
> > device.
>
> Right now the module implements
>
>   ceph device {ident,fault}-light-{on,off} <devid>
>
> although once this is all working we can also add commands that operate on
> osd IDs.
>
> > - This feels like a "desired state" way of doing things since you want
> > a light on until you decide to turn it off. In this case, we could
> > create a CRD for desired state of device lights. CRDs are the way the
> > rook module should interact with the rook operator.
> >     - Whenever the CRD changes, rook would update the lights. When
> > rook starts, it would also ensure the lights are set appropriately.
> >     - If a CRD is created it could mean the light should turn on for
> > that device. If the CRD is deleted, the light should turn off. If
> > there were different blinking modes, there could be a setting in the
> > CRD to indicate such.
>
> That works.  I was just thinking that since the mgr is already maintaining
> this set of desired-on lights we could keep the rook side of it simple.
>

Ah i missed that the mgr already stored this state. So if we can't
detect the actual state of the lights, this means the mgr is only
keeping track of the desire to turn the light on or off? And this
would translate to a health warning if a light should be on.

> > - What does it take to detect the current state of the lights? Do we
> > run lsmcli on each node? If so, the discovery daemonset would make
> > sense to do this.
>
> If rook took the additional step of detecting lights that are on (due to
> external actors) that would make the whole thing a bit more robust, and be
> a good reason to bother with teh complexity of a CRD.  I don't see
> anything to get current status from the version I have on fedora 29,
> though.
>
> > If we didn't use a CRD, the rook module could store the settings in a
> > configmap, then run a k8s job itself to turn the lights on or off.
> > However, I'd say the CRDs are the more natural approach.
>
> If we can't detect the current state with current tools, I wonder if just
> having the mgr module schedule a one-off command to run lsmcli is
> simpler... does having rook store the state in a configmap or crd buy us
> anything?
>

Right, if we can't detect the current state of the lights, rook can't
really manage the desired state and may not make sense for rook to get
involved here. The mgr module could easily run a k8s job directly to
turn the light on or off and we wouldn't worry about managing desired
state.

> sage
>
>
> >
> > Travis
> >
> > On Wed, Feb 27, 2019 at 3:25 PM Gregory Farnum <gfarnum@xxxxxxxxxx> wrote:
> > >
> > > On Wed, Feb 27, 2019 at 1:16 PM Sage Weil <sweil@xxxxxxxxxx> wrote:
> > > >
> > > > See
> > > >
> > > >         https://github.com/ceph/ceph/pull/26684
> > > >         https://pad.ceph.com/p/blinky-lights
> > > >
> > > > I think the hurdles are:
> > > >
> > > > - Add the appropriate hook to orchestrator_cli to turn a light on or off.
> > > > Right now the code to remote() to the orchestrator is commented out in my
> > > > PR.  The call sites have the device id (vendor/model/serial), host, and
> > > > device name (e.g., sda).
> > > >
> > > > - Get a recentish libstoragemgmt into the rook container image, or some
> > > > other container image we can schedule.
> > > >
> > > > - Either teach rook how to do a one-off "run this command on this host" to
> > > > turn a light on or off, or teach the mgr rook module to schedule that
> > > > command itself.  I'm not sure whether or not we want/need rook in the loop
> > > > for turning these lights on or not... thoughts?  It seems like if rook
> > > > does it, it needs a configmap (or something) to store the state of lights
> > > > it wants on or off so it can reset them when it restarts.  The mgr module
> > > > can (should?) do the exact same thing when the mgr restarts.
> > >
> > > This sounds like you need an interface for querying the state of
> > > lights as well then? I presume the dashboard wants to show what lights
> > > are on or off, not merely let admins push a button to change them...
> > >
> > > >
> > > > For the record, the lsmcli command we ultimately need to run is
> > > >
> > > >  lsmcli local-disk-fault-led-on --path $PATH
> > > >
> > > > modulo s/fault/ident/ or s/on/off/.
> > > >
> > > > sage
> > > >
> >
> >