Re: blinking lights via rook

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



That seems different than reading the state of an LED, but rather
tracking LEDs have been turned on or not.  I.e. internal state -
doesn't have to match actual diode state, just need to be controlled
centrally - one point of truth.  Actually physically reading an LED
isn't actually always reliable.

On Thu, Feb 28, 2019 at 11:59 AM Sage Weil <sweil@xxxxxxxxxx> wrote:
>
> On Thu, 28 Feb 2019, Brett Niver wrote:
> > Why do we care about state?  At some level the code has reasons to
> > want the LED to be either on or off...
>
> Mostly we don't need to care.  I can think of a couple problem
> scenarios, though:
>
> - Someone out of band turns a light on.  Then ceph turns on another light,
> a human sees the first light, a pulls the wrong drive.
>
> - What if the host is down, but you want the health warning to go away?
> There needs to be some 'force' option that will proceed to forget the
> light was ever on when we can't reach the host, but that relies on a human
> operator promising that the host really is off and thus the light won't
> come back on.
>
> - We have some bug/race in our code that means we fail to turn off the
> light before removing our notion that the light is on.  Maybe an aborted
> attempt to turn the light on has some slow request wandering through the
> orchestrator queue of stuff to do and finally executes sometime after we
> tell the system to turn the light back off?
>
> sage
>
>
>
> >
> > On Thu, Feb 28, 2019 at 9:11 AM Sage Weil <sweil@xxxxxxxxxx> wrote:
> > >
> > > On Thu, 28 Feb 2019, Tim Serong wrote:
> > > > On 02/28/2019 09:50 AM, Travis Nielsen wrote:
> > > > > On Wed, Feb 27, 2019 at 3:42 PM Sage Weil <sweil@xxxxxxxxxx> wrote:
> > > > >>
> > > > >> On Wed, 27 Feb 2019, Travis Nielsen wrote:
> > > > >>> Some questions and comments:
> > > > >>> - What is the user interaction? Is he specifying an OSD ID for which
> > > > >>> he wants to blink the light or what is $PATH? If $PATH is a device
> > > > >>> name such as /dev/sdb we would need to translate the OSD ID to the
> > > > >>> device.
> > > > >>
> > > > >> Right now the module implements
> > > > >>
> > > > >>   ceph device {ident,fault}-light-{on,off} <devid>
> > > > >>
> > > > >> although once this is all working we can also add commands that operate on
> > > > >> osd IDs.
> > > >
> > > > Presumably the OSD commands will just be implemented directly inside
> > > > ceph-mgr (which can get OSD metadata to map IDs back to the relevant
> > > > hostnames and device paths)?  Or is there anything special an individual
> > > > orchesetrator might need to do for this case?
> > >
> > > Right, it'll just be a slightly more complicated command in the blinky
> > > module (or wherever we move this code to later).
> > >
> > > > >>> - This feels like a "desired state" way of doing things since you want
> > > > >>> a light on until you decide to turn it off. In this case, we could
> > > > >>> create a CRD for desired state of device lights. CRDs are the way the
> > > > >>> rook module should interact with the rook operator.
> > > > >>>     - Whenever the CRD changes, rook would update the lights. When
> > > > >>> rook starts, it would also ensure the lights are set appropriately.
> > > > >>>     - If a CRD is created it could mean the light should turn on for
> > > > >>> that device. If the CRD is deleted, the light should turn off. If
> > > > >>> there were different blinking modes, there could be a setting in the
> > > > >>> CRD to indicate such.
> > > > >>
> > > > >> That works.  I was just thinking that since the mgr is already maintaining
> > > > >> this set of desired-on lights we could keep the rook side of it simple.
> > > > >>
> > > > >
> > > > > Ah i missed that the mgr already stored this state. So if we can't
> > > > > detect the actual state of the lights, this means the mgr is only
> > > > > keeping track of the desire to turn the light on or off? And this
> > > > > would translate to a health warning if a light should be on.
> > > > >
> > > > >>> - What does it take to detect the current state of the lights? Do we
> > > > >>> run lsmcli on each node? If so, the discovery daemonset would make
> > > > >>> sense to do this.
> > > > >>
> > > > >> If rook took the additional step of detecting lights that are on (due to
> > > > >> external actors) that would make the whole thing a bit more robust, and be
> > > > >> a good reason to bother with teh complexity of a CRD.  I don't see
> > > > >> anything to get current status from the version I have on fedora 29,
> > > > >> though.
> > > > >>
> > > > >>> If we didn't use a CRD, the rook module could store the settings in a
> > > > >>> configmap, then run a k8s job itself to turn the lights on or off.
> > > > >>> However, I'd say the CRDs are the more natural approach.
> > > > >>
> > > > >> If we can't detect the current state with current tools, I wonder if just
> > > > >> having the mgr module schedule a one-off command to run lsmcli is
> > > > >> simpler... does having rook store the state in a configmap or crd buy us
> > > > >> anything?
> > > > >>
> > > > >
> > > > > Right, if we can't detect the current state of the lights, rook can't
> > > > > really manage the desired state and may not make sense for rook to get
> > > > > involved here. The mgr module could easily run a k8s job directly to
> > > > > turn the light on or off and we wouldn't worry about managing desired
> > > > > state.
> > > > I'd suggest the same is true for other ochestrators
> > > > (ansible/deepsea/ssh).  If we can't detect the state, we shouldn't do
> > > > anything at the individual orchestrator level.  (If we could detect
> > > > state, we'd just want to pass it up to ceph-mgr, rather than having each
> > > > individual module implement its own record of LED state)
> > >
> > > Right.
> > >
> > > sage
> >
> >
> >



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux