Re: blinking lights via rook

Tim Serong <tserong@xxxxxxxx> · Thu, 28 Feb 2019 18:59:58 +1100

On 02/28/2019 09:50 AM, Travis Nielsen wrote:
> On Wed, Feb 27, 2019 at 3:42 PM Sage Weil <sweil@xxxxxxxxxx> wrote:
>>
>> On Wed, 27 Feb 2019, Travis Nielsen wrote:
>>> Some questions and comments:
>>> - What is the user interaction? Is he specifying an OSD ID for which
>>> he wants to blink the light or what is $PATH? If $PATH is a device
>>> name such as /dev/sdb we would need to translate the OSD ID to the
>>> device.
>>
>> Right now the module implements
>>
>>   ceph device {ident,fault}-light-{on,off} <devid>
>>
>> although once this is all working we can also add commands that operate on
>> osd IDs.

Presumably the OSD commands will just be implemented directly inside
ceph-mgr (which can get OSD metadata to map IDs back to the relevant
hostnames and device paths)?  Or is there anything special an individual
orchesetrator might need to do for this case?

>>
>>> - This feels like a "desired state" way of doing things since you want
>>> a light on until you decide to turn it off. In this case, we could
>>> create a CRD for desired state of device lights. CRDs are the way the
>>> rook module should interact with the rook operator.
>>>     - Whenever the CRD changes, rook would update the lights. When
>>> rook starts, it would also ensure the lights are set appropriately.
>>>     - If a CRD is created it could mean the light should turn on for
>>> that device. If the CRD is deleted, the light should turn off. If
>>> there were different blinking modes, there could be a setting in the
>>> CRD to indicate such.
>>
>> That works.  I was just thinking that since the mgr is already maintaining
>> this set of desired-on lights we could keep the rook side of it simple.
>>
> 
> Ah i missed that the mgr already stored this state. So if we can't
> detect the actual state of the lights, this means the mgr is only
> keeping track of the desire to turn the light on or off? And this
> would translate to a health warning if a light should be on.
> 
>>> - What does it take to detect the current state of the lights? Do we
>>> run lsmcli on each node? If so, the discovery daemonset would make
>>> sense to do this.
>>
>> If rook took the additional step of detecting lights that are on (due to
>> external actors) that would make the whole thing a bit more robust, and be
>> a good reason to bother with teh complexity of a CRD.  I don't see
>> anything to get current status from the version I have on fedora 29,
>> though.
>>
>>> If we didn't use a CRD, the rook module could store the settings in a
>>> configmap, then run a k8s job itself to turn the lights on or off.
>>> However, I'd say the CRDs are the more natural approach.
>>
>> If we can't detect the current state with current tools, I wonder if just
>> having the mgr module schedule a one-off command to run lsmcli is
>> simpler... does having rook store the state in a configmap or crd buy us
>> anything?
>>
> 
> Right, if we can't detect the current state of the lights, rook can't
> really manage the desired state and may not make sense for rook to get
> involved here. The mgr module could easily run a k8s job directly to
> turn the light on or off and we wouldn't worry about managing desired
> state.
I'd suggest the same is true for other ochestrators
(ansible/deepsea/ssh).  If we can't detect the state, we shouldn't do
anything at the individual orchestrator level.  (If we could detect
state, we'd just want to pass it up to ceph-mgr, rather than having each
individual module implement its own record of LED state)

Regards,

Tim
-- 
Tim Serong
Senior Clustering Engineer
SUSE
tserong@xxxxxxxx