Re: osd shutdown notification

Greg Farnum <gregory.farnum@xxxxxxxxxxxxx> · Mon, 26 Mar 2012 12:45:14 -0700



On Monday, March 26, 2012 at 12:36 PM, Sage Weil wrote:
> Currently when you shutdown/kill a ceph-osd it is no different from it 
> crashing: you have to wait N seconds for its peers to conclude the process 
> is down before the OSD is deemed 'failed' and the osd map is updated.
> 
> This would be pretty easy to improve on:
> 
> - on a clean shutdown (e.g., due to SIGTERM), we could execv a call to 
> the ceph tool to tell the monitors the osd stopped (maybe with a 
> 'reason' and nice log message).
> 
> - on an unclean shutdown (e.g., failed assert, segfault) we can
> do the same, with an appropriate message in the system log
> 
> Basically it means that is ceph-osd crashes or shuts down then it 
> will normally get instantly marked down without waiting for the normal osd 
> timeout to expire.
> 
> execv() is kind of ugly, but seems safer in the failure cases, where you 
> can't trust the existing MonClient to be operational. 
> 
> Alternatively, some external wrapper could watch for the process to 
> terminate and notify the cluster, but this would be a bit more difficult 
> to implement, because that notification needs to uniquely identify the 
> process instance (e.g., via the cluster addr), and we'd need some way for 
> it to wait for the osd to join and then extract that id, etc.
> 
> Thoughts?
execve to an external binary seems like the wrong tool for this job. On clean shutdown the OSD can send off a notification itself; somehow handling failures seems like a job for the monitoring service, not Ceph itself.
Doing it this way would also complicate cephx key management, since you either need an extra "osd-notifier" key added to each OSD node, or to give each OSD key modification privileges on the monitor.


--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html