Well, keep in mind that this isn't just for identification of failed disks. I can conceive of use cases where a user flips on one or more drive LEDs for identification or debugging purposes. That would be the distinction between identity and fail. We can give the user the ability to distinguish between the two and figure out which they'd want to use at any given time (also, keep in mind that the failure LED is not customer controllable behind some storage controllers anyway...). I was wondering if I'd need to carry along the last known disk state...guess I'll figure that nuance out as I go. Joe > On Apr 1, 2015, at 6:17 PM, Sage Weil <sage@xxxxxxxxxxxx> wrote: > > #2 really sounds safer to me. In particular, you need to be really > careful not to flash an LED until you're sure you don't need the data on > the disk (i.e., it's down+out and the cluster state is healthy--no heroic > measures needed). I think anything that triggers flashing that doesn't > have a holistic view of the cluster would be dangerous. > > That, combined with the complications around ceph-osd possibly not > running, make me thing this would be the calamari agent that does the > flashing. > > It also may be necessary for the disk -> last known state mapping to go > somewhere other than in just osd metadata; if the osd is recreated or the > id gets reused that info go away. (We could also be careful to avoid > deallocating the id until the disk is removed, I guess, but it's another > constraint to worry about.) > > sage > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html