On Wed, 27 Feb 2019, Travis Nielsen wrote: > Some questions and comments: > - What is the user interaction? Is he specifying an OSD ID for which > he wants to blink the light or what is $PATH? If $PATH is a device > name such as /dev/sdb we would need to translate the OSD ID to the > device. Right now the module implements ceph device {ident,fault}-light-{on,off} <devid> although once this is all working we can also add commands that operate on osd IDs. > - This feels like a "desired state" way of doing things since you want > a light on until you decide to turn it off. In this case, we could > create a CRD for desired state of device lights. CRDs are the way the > rook module should interact with the rook operator. > - Whenever the CRD changes, rook would update the lights. When > rook starts, it would also ensure the lights are set appropriately. > - If a CRD is created it could mean the light should turn on for > that device. If the CRD is deleted, the light should turn off. If > there were different blinking modes, there could be a setting in the > CRD to indicate such. That works. I was just thinking that since the mgr is already maintaining this set of desired-on lights we could keep the rook side of it simple. > - What does it take to detect the current state of the lights? Do we > run lsmcli on each node? If so, the discovery daemonset would make > sense to do this. If rook took the additional step of detecting lights that are on (due to external actors) that would make the whole thing a bit more robust, and be a good reason to bother with teh complexity of a CRD. I don't see anything to get current status from the version I have on fedora 29, though. > If we didn't use a CRD, the rook module could store the settings in a > configmap, then run a k8s job itself to turn the lights on or off. > However, I'd say the CRDs are the more natural approach. If we can't detect the current state with current tools, I wonder if just having the mgr module schedule a one-off command to run lsmcli is simpler... does having rook store the state in a configmap or crd buy us anything? sage > > Travis > > On Wed, Feb 27, 2019 at 3:25 PM Gregory Farnum <gfarnum@xxxxxxxxxx> wrote: > > > > On Wed, Feb 27, 2019 at 1:16 PM Sage Weil <sweil@xxxxxxxxxx> wrote: > > > > > > See > > > > > > https://github.com/ceph/ceph/pull/26684 > > > https://pad.ceph.com/p/blinky-lights > > > > > > I think the hurdles are: > > > > > > - Add the appropriate hook to orchestrator_cli to turn a light on or off. > > > Right now the code to remote() to the orchestrator is commented out in my > > > PR. The call sites have the device id (vendor/model/serial), host, and > > > device name (e.g., sda). > > > > > > - Get a recentish libstoragemgmt into the rook container image, or some > > > other container image we can schedule. > > > > > > - Either teach rook how to do a one-off "run this command on this host" to > > > turn a light on or off, or teach the mgr rook module to schedule that > > > command itself. I'm not sure whether or not we want/need rook in the loop > > > for turning these lights on or not... thoughts? It seems like if rook > > > does it, it needs a configmap (or something) to store the state of lights > > > it wants on or off so it can reset them when it restarts. The mgr module > > > can (should?) do the exact same thing when the mgr restarts. > > > > This sounds like you need an interface for querying the state of > > lights as well then? I presume the dashboard wants to show what lights > > are on or off, not merely let admins push a button to change them... > > > > > > > > For the record, the lsmcli command we ultimately need to run is > > > > > > lsmcli local-disk-fault-led-on --path $PATH > > > > > > modulo s/fault/ident/ or s/on/off/. > > > > > > sage > > > > >