Re: blinking lights via rook

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 27 Feb 2019, Gregory Farnum wrote:
> On Wed, Feb 27, 2019 at 1:16 PM Sage Weil <sweil@xxxxxxxxxx> wrote:
> >
> > See
> >
> >         https://github.com/ceph/ceph/pull/26684
> >         https://pad.ceph.com/p/blinky-lights
> >
> > I think the hurdles are:
> >
> > - Add the appropriate hook to orchestrator_cli to turn a light on or off.
> > Right now the code to remote() to the orchestrator is commented out in my
> > PR.  The call sites have the device id (vendor/model/serial), host, and
> > device name (e.g., sda).
> >
> > - Get a recentish libstoragemgmt into the rook container image, or some
> > other container image we can schedule.
> >
> > - Either teach rook how to do a one-off "run this command on this host" to
> > turn a light on or off, or teach the mgr rook module to schedule that
> > command itself.  I'm not sure whether or not we want/need rook in the loop
> > for turning these lights on or not... thoughts?  It seems like if rook
> > does it, it needs a configmap (or something) to store the state of lights
> > it wants on or off so it can reset them when it restarts.  The mgr module
> > can (should?) do the exact same thing when the mgr restarts.
> 
> This sounds like you need an interface for querying the state of
> lights as well then? I presume the dashboard wants to show what lights
> are on or off, not merely let admins push a button to change them...

The blinky module in that PR persists the set of devices that should be 
lit up before calling out to the orchestrator to turn on lights, and it 
calls out to turn them off before removing them from the persisted set.  
Health warnings are raised for any lights in the set.

This way the failure mode is always that ceph thinks lights are on that 
aren't, but never that actual lights are on that we think are off.  
There's some additional paranoia we can add to the module to ensure this 
is the case (e.g., mutex around turning things on/off to avoid 
a on vs off race), but as a first pass it should be pretty safe.

sage

> 
> >
> > For the record, the lsmcli command we ultimately need to run is
> >
> >  lsmcli local-disk-fault-led-on --path $PATH
> >
> > modulo s/fault/ident/ or s/on/off/.
> >
> > sage
> >
> 
> 
> 



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux