On Wed, 24 Oct 2018, Jan Fajerski wrote: > On Tue, Oct 23, 2018 at 11:08:57PM +0000, Sage Weil wrote: > > I gave the latest lsmcli (libstoragemgmt) another try and it can blink the > > HDD lights on my generic 2u supermicro boxes! It was a bit of a hassle > > because ubuntu has an ancient version packaged, but once I built from > > source it can do 'ident' (blinky red light) or 'fault' (solid red light). > > Pretty simple! And now is the time to harass the ubuntu/debian folks to > > get this into the next round of releases so we can take advantage of it > > (Fedora/RHEL/CentOS should already have a good version.) > > > > With the new device tracking that's coming in nautilus, I think we have > > most of the pieces to surface useful ceph controls to turn lights on and > > off. For example, > > > > $ ceph device ls > > DEVICE HOST:DEV DAEMONS LIFE > > EXPECTANCY > > Crucial_CT1024M550SSD1_14160C164100 stud:sdd osd.40 >6w > > Crucial_CT1024M550SSD1_14210C25B79E eutow:sds osd.19 >6w > > > > So we could add > > > > $ ceph device ident-on Crucial_CT1024M550SSD1_14160C164100 > > $ ceph device fault-on Crucial_CT1024M550SSD1_14210C25B79E > > ... > > $ ceph device ident-off Crucial_CT1024M550SSD1_14160C164100 > > $ ceph device fault-off Crucial_CT1024M550SSD1_14210C25B79E > > > > or perhaps > > > > $ ceph osd ident-on osd.123 > > $ ceph osd fault-on osd.124 > I'd prefer this. Maybe by default only the data device, with a flag to > optionally blink the shared journal/db device? Maybe I should have written "and" above instead of "or". The device command is easier to track and explicit, so I wouldn't want to skip it, but the osd ones will be much more convenient/friendly. Which reminds me, it isn't currently very easy to tell what the primary vs secondary (db/journal) device(s) are for an OSD. Currently you have to sift through the 'ceph osd metadata osd.N' output (which has a ton of other junk in it), and the fields vary between filestore and bluestore. I wonder if this should be added to the 'ceph osd find' command output, which currently shows the host, crush location, and (very soon) will also include the container/pod name, pod namespace, and other useful identifying location-y info. Maybe the devices aren't a perfect fit into that mold, but we don't have another existing "tell me about this specific osd" command right now... unless we want to create one (ceph osd info?). > > (although not that osds maybe backed by multiple devices, and you probably > > don't want to pull the shared db/journal device in most cases). > > > > My current thinking is that which lights should be on is persistently > > stored by Ceph, and raises a HEALTH_WARN (or HEALTH_INFO, nudge nudge) > > alert so that the operator knows that the light(s) are (still) on. > > > > How to run nmcli > > ---------------- > > > > We can pretty trivially invoke 'lsmcli local-disk-fault-led-off --path > > whatever' (or do something more minimal using the python bindings). The > > gotcha is that we have to have something running on that host in order to > > do it. > > > > So, it would be pretty easy for an osd to ident its device(s) when it is > > up, but if it's not up, then... not so much. > > > > A few options: > > > > 1) Only do the ident/fault from a running OSD. This is pretty limiting, > > and also runs the danger of not being able to turn the light off (if the > > OSD then goes down). > > > > 2) Trigger the lights from any OSD (or possibly other daemon) that happens > > to be running on the same host. This probably covers most cases, but.. > > it's still a bit limited. What if no OSDs are up? What if there is only > > one OSD on the host and it is down? > > > > 3) Delegate this to the new orchestrator. Kube can just run this command > > wherever we want. Ansible presumably can too. > Imho this is the way to go. DeepSea was actually about to start working on > this, so great timing :) > One other detail: while I'm sure libstorage is getting better with time, I'm > equally sure there will always be hardware that does not play along. We were > going to make the actual command configurable so user can drop in whatever > they need for this. Going the operator route, this might not be ceph's concern > anymore, just thought I'd mention it. Great timing! And yeah, I agree that defering to the orchestrator is probably the way to go here. Sadly that means it won't work for my ceph-deploy-based (non-ansible, non-rook) home cluster. But none of the other new orchestrator commands will either, so I think this is an argument for a minimal 'ssh' orchestrator that simply gives the mgr a root ssh key for all nodes in the cluster... sage > > 4) Depend on the libstoragemgmt network service. nmcli is just one part > > of the suite... there's also a REST API that lets you do stuff. There are > > presumably certificates to configure and such to make it all work, though. > > > > Also, there are some implementation oddities. The on/off state source > > of truth is the enclosure itself. So if you turn the light off in ceph, > > we need to be certain we turned it off with the device before we clear out > > our state. Maybe we have states like off, pending-on, on, pending-off, > > and we don't transition from pending-foo to foo until we get a success > > from the command that is supposed to toggle the light state. > > > > Thoughts? I think this is within striking distance (finally) and it would > > be sweet to land it in nautilus! > > > > sage > > > > -- > Jan Fajerski > Engineer Enterprise Storage > SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, > HRB 21284 (AG Nürnberg) > >