On Mon, 26 Mar 2012, Tommi Virtanen wrote: > On Mon, Mar 26, 2012 at 12:36, Sage Weil <sage@xxxxxxxxxxxx> wrote: > > Currently when you shutdown/kill a ceph-osd it is no different from it > > crashing: you have to wait N seconds for its peers to conclude the process > > is down before the OSD is deemed 'failed' and the osd map is updated. > > > > This would be pretty easy to improve on: > > > > - on a clean shutdown (e.g., due to SIGTERM), we could execv a call to > > the ceph tool to tell the monitors the osd stopped (maybe with a > > 'reason' and nice log message). > > > > - on an unclean shutdown (e.g., failed assert, segfault) we can > > do the same, with an appropriate message in the system log > > A clean shutdown can send the "This location is now defunct" message > itself, execve seems to be extra just complications for it. Yeah > For daemon crashes, perhaps the next run, after upstart/etc restarts > the daemon, can somehow convince others proactively that the new > osd.42 is better than the old osd.42. That sounds like a good feature > to have.. That much we already have, but startup/restart can take a while. sysvinit doesn't do auto-restart, though, and it would be nice not to rely on it in upstart/whatever. I can also imagine a scenario where we don't have auto-restart but do want fast failure notification... sage