Re: blinking lights

Sage Weil <sage@xxxxxxxxxxxx> · Wed, 24 Oct 2018 13:15:53 +0000 (UTC)

On Wed, 24 Oct 2018, Jan Fajerski wrote:
> On Tue, Oct 23, 2018 at 11:08:57PM +0000, Sage Weil wrote:
> > I gave the latest lsmcli (libstoragemgmt) another try and it can blink the
> > HDD lights on my generic 2u supermicro boxes!  It was a bit of a hassle
> > because ubuntu has an ancient version packaged, but once I built from
> > source it can do 'ident' (blinky red light) or 'fault' (solid red light).
> > Pretty simple!  And now is the time to harass the ubuntu/debian folks to
> > get this into the next round of releases so we can take advantage of it
> > (Fedora/RHEL/CentOS should already have a good version.)
> > 
> > With the new device tracking that's coming in nautilus, I think we have
> > most of the pieces to surface useful ceph controls to turn lights on and
> > off.  For example,
> > 
> > $ ceph device ls
> > DEVICE                                  HOST:DEV      DAEMONS LIFE
> > EXPECTANCY
> > Crucial_CT1024M550SSD1_14160C164100     stud:sdd      osd.40  >6w
> > Crucial_CT1024M550SSD1_14210C25B79E     eutow:sds     osd.19  >6w
> > 
> > So we could add
> > 
> > $ ceph device ident-on Crucial_CT1024M550SSD1_14160C164100
> > $ ceph device fault-on Crucial_CT1024M550SSD1_14210C25B79E
> > ...
> > $ ceph device ident-off Crucial_CT1024M550SSD1_14160C164100
> > $ ceph device fault-off Crucial_CT1024M550SSD1_14210C25B79E
> > 
> > or perhaps
> > 
> > $ ceph osd ident-on osd.123
> > $ ceph osd fault-on osd.124
> I'd prefer this. Maybe by default only the data device, with a flag to
> optionally blink the shared journal/db device?

Maybe I should have written "and" above instead of "or".  The device 
command is easier to track and explicit, so I wouldn't want to skip it, 
but the osd ones will be much more convenient/friendly.

Which reminds me, it isn't currently very easy to tell what the primary vs 
secondary (db/journal) device(s) are for an OSD.  Currently you have to 
sift through the 'ceph osd metadata osd.N' output (which has a ton of 
other junk in it), and the fields vary between filestore and bluestore.  I 
wonder if this should be added to the 'ceph osd find' command output, 
which currently shows the host, crush location, and (very soon) will also 
include the container/pod name, pod namespace, and other useful 
identifying location-y info.  Maybe the devices aren't a perfect fit into 
that mold, but we don't have another existing "tell me about this specific 
osd" command right now... unless we want to create one (ceph osd info?).

> > (although not that osds maybe backed by multiple devices, and you probably
> > don't want to pull the shared db/journal device in most cases).
> > 
> > My current thinking is that which lights should be on is persistently
> > stored by Ceph, and raises a HEALTH_WARN (or HEALTH_INFO, nudge nudge)
> > alert so that the operator knows that the light(s) are (still) on.
> > 
> > How to run nmcli
> > ----------------
> > 
> > We can pretty trivially invoke 'lsmcli local-disk-fault-led-off --path
> > whatever' (or do something more minimal using the python bindings).  The
> > gotcha is that we have to have something running on that host in order to
> > do it.
> > 
> > So, it would be pretty easy for an osd to ident its device(s) when it is
> > up, but if it's not up, then... not so much.
> > 
> > A few options:
> > 
> > 1) Only do the ident/fault from a running OSD.  This is pretty limiting,
> > and also runs the danger of not being able to turn the light off (if the
> > OSD then goes down).
> > 
> > 2) Trigger the lights from any OSD (or possibly other daemon) that happens
> > to be running on the same host.  This probably covers most cases, but..
> > it's still a bit limited.  What if no OSDs are up?  What if there is only
> > one OSD on the host and it is down?
> > 
> > 3) Delegate this to the new orchestrator.  Kube can just run this command
> > wherever we want.  Ansible presumably can too.
> Imho this is the way to go. DeepSea was actually about to start working on
> this, so great timing :)
> One other detail: while I'm sure libstorage is getting better with time, I'm
> equally sure there will always be hardware that does not play along. We were
> going to make the actual command configurable so user can drop in whatever
> they need for this. Going the operator route, this might not be ceph's concern
> anymore, just thought I'd mention it.

Great timing!  And yeah, I agree that defering to the orchestrator is 
probably the way to go here.  Sadly that means it won't work for my 
ceph-deploy-based (non-ansible, non-rook) home cluster.  But none of the 
other new orchestrator commands will either, so I think this is an 
argument for a minimal 'ssh' orchestrator that simply gives the mgr a root 
ssh key for all nodes in the cluster...

sage

> > 4) Depend on the libstoragemgmt network service.  nmcli is just one part
> > of the suite... there's also a REST API that lets you do stuff.  There are
> > presumably certificates to configure and such to make it all work, though.
> > 
> > Also, there are some implementation oddities.  The on/off state source
> > of truth is the enclosure itself.  So if you turn the light off in ceph,
> > we need to be certain we turned it off with the device before we clear out
> > our state.  Maybe we have states like off, pending-on, on, pending-off,
> > and we don't transition from pending-foo to foo until we get a success
> > from the command that is supposed to toggle the light state.
> > 
> > Thoughts?  I think this is within striking distance (finally) and it would
> > be sweet to land it in nautilus!
> > 
> > sage
> > 
> 
> -- 
> Jan Fajerski
> Engineer Enterprise Storage
> SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton,
> HRB 21284 (AG Nürnberg)
> 
>