Re: Advice for implementation of LED behavior in Ceph ecosystem

Sage Weil <sage@xxxxxxxxxxxx> · Wed, 1 Apr 2015 16:17:32 -0700 (PDT)

On Wed, 1 Apr 2015, John Spray wrote:
> I guess in this interesting case you could either:
>  * Allow other OSDs on the same host to handle the 'tell blink' command for
> the dead OSD's drive
>  * Leave this to calamari/whoever to read the dead OSD's block device path
> from "ceph osd metadata", and go blink the LEDs themselves.

#2 really sounds safer to me.  In particular, you need to be really 
careful not to flash an LED until you're sure you don't need the data on 
the disk (i.e., it's down+out and the cluster state is healthy--no heroic 
measures needed).  I think anything that triggers flashing that doesn't 
have a holistic view of the cluster would be dangerous.

That, combined with the complications around ceph-osd possibly not 
running, make me thing this would be the calamari agent that does the 
flashing.

It also may be necessary for the disk -> last known state mapping to go 
somewhere other than in just osd metadata; if the osd is recreated or the 
id gets reused that info go away.  (We could also be careful to avoid 
deallocating the id until the disk is removed, I guess, but it's another 
constraint to worry about.)

sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html