Re: blinking lights via rook

Sage Weil <sweil@xxxxxxxxxx> · Thu, 28 Feb 2019 14:10:59 +0000 (UTC)

On Thu, 28 Feb 2019, Tim Serong wrote:
> On 02/28/2019 09:50 AM, Travis Nielsen wrote:
> > On Wed, Feb 27, 2019 at 3:42 PM Sage Weil <sweil@xxxxxxxxxx> wrote:
> >>
> >> On Wed, 27 Feb 2019, Travis Nielsen wrote:
> >>> Some questions and comments:
> >>> - What is the user interaction? Is he specifying an OSD ID for which
> >>> he wants to blink the light or what is $PATH? If $PATH is a device
> >>> name such as /dev/sdb we would need to translate the OSD ID to the
> >>> device.
> >>
> >> Right now the module implements
> >>
> >>   ceph device {ident,fault}-light-{on,off} <devid>
> >>
> >> although once this is all working we can also add commands that operate on
> >> osd IDs.
> 
> Presumably the OSD commands will just be implemented directly inside
> ceph-mgr (which can get OSD metadata to map IDs back to the relevant
> hostnames and device paths)?  Or is there anything special an individual
> orchesetrator might need to do for this case?

Right, it'll just be a slightly more complicated command in the blinky 
module (or wherever we move this code to later).

> >>> - This feels like a "desired state" way of doing things since you want
> >>> a light on until you decide to turn it off. In this case, we could
> >>> create a CRD for desired state of device lights. CRDs are the way the
> >>> rook module should interact with the rook operator.
> >>>     - Whenever the CRD changes, rook would update the lights. When
> >>> rook starts, it would also ensure the lights are set appropriately.
> >>>     - If a CRD is created it could mean the light should turn on for
> >>> that device. If the CRD is deleted, the light should turn off. If
> >>> there were different blinking modes, there could be a setting in the
> >>> CRD to indicate such.
> >>
> >> That works.  I was just thinking that since the mgr is already maintaining
> >> this set of desired-on lights we could keep the rook side of it simple.
> >>
> > 
> > Ah i missed that the mgr already stored this state. So if we can't
> > detect the actual state of the lights, this means the mgr is only
> > keeping track of the desire to turn the light on or off? And this
> > would translate to a health warning if a light should be on.
> > 
> >>> - What does it take to detect the current state of the lights? Do we
> >>> run lsmcli on each node? If so, the discovery daemonset would make
> >>> sense to do this.
> >>
> >> If rook took the additional step of detecting lights that are on (due to
> >> external actors) that would make the whole thing a bit more robust, and be
> >> a good reason to bother with teh complexity of a CRD.  I don't see
> >> anything to get current status from the version I have on fedora 29,
> >> though.
> >>
> >>> If we didn't use a CRD, the rook module could store the settings in a
> >>> configmap, then run a k8s job itself to turn the lights on or off.
> >>> However, I'd say the CRDs are the more natural approach.
> >>
> >> If we can't detect the current state with current tools, I wonder if just
> >> having the mgr module schedule a one-off command to run lsmcli is
> >> simpler... does having rook store the state in a configmap or crd buy us
> >> anything?
> >>
> > 
> > Right, if we can't detect the current state of the lights, rook can't
> > really manage the desired state and may not make sense for rook to get
> > involved here. The mgr module could easily run a k8s job directly to
> > turn the light on or off and we wouldn't worry about managing desired
> > state.
> I'd suggest the same is true for other ochestrators
> (ansible/deepsea/ssh).  If we can't detect the state, we shouldn't do
> anything at the individual orchestrator level.  (If we could detect
> state, we'd just want to pass it up to ceph-mgr, rather than having each
> individual module implement its own record of LED state)

Right.

sage