On Monday, March 26, 2012 at 12:36 PM, Sage Weil wrote: > Currently when you shutdown/kill a ceph-osd it is no different from it > crashing: you have to wait N seconds for its peers to conclude the process > is down before the OSD is deemed 'failed' and the osd map is updated. > > This would be pretty easy to improve on: > > - on a clean shutdown (e.g., due to SIGTERM), we could execv a call to > the ceph tool to tell the monitors the osd stopped (maybe with a > 'reason' and nice log message). > > - on an unclean shutdown (e.g., failed assert, segfault) we can > do the same, with an appropriate message in the system log > > Basically it means that is ceph-osd crashes or shuts down then it > will normally get instantly marked down without waiting for the normal osd > timeout to expire. > > execv() is kind of ugly, but seems safer in the failure cases, where you > can't trust the existing MonClient to be operational. > > Alternatively, some external wrapper could watch for the process to > terminate and notify the cluster, but this would be a bit more difficult > to implement, because that notification needs to uniquely identify the > process instance (e.g., via the cluster addr), and we'd need some way for > it to wait for the osd to join and then extract that id, etc. > > Thoughts? execve to an external binary seems like the wrong tool for this job. On clean shutdown the OSD can send off a notification itself; somehow handling failures seems like a job for the monitoring service, not Ceph itself. Doing it this way would also complicate cephx key management, since you either need an extra "osd-notifier" key added to each OSD node, or to give each OSD key modification privileges on the monitor. -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html