Re: osd shutdown notification

Sage Weil <sage@xxxxxxxxxxxx> · Mon, 26 Mar 2012 12:52:51 -0700 (PDT)

On Mon, 26 Mar 2012, Tommi Virtanen wrote:
> On Mon, Mar 26, 2012 at 12:36, Sage Weil <sage@xxxxxxxxxxxx> wrote:
> > Currently when you shutdown/kill a ceph-osd it is no different from it
> > crashing: you have to wait N seconds for its peers to conclude the process
> > is down before the OSD is deemed 'failed' and the osd map is updated.
> >
> > This would be pretty easy to improve on:
> >
> >  - on a clean shutdown (e.g., due to SIGTERM), we could execv a call to
> >   the ceph tool to tell the monitors the osd stopped (maybe with a
> >   'reason' and nice log message).
> >
> >  - on an unclean shutdown (e.g., failed assert, segfault) we can
> >   do the same, with an appropriate message in the system log
> 
> A clean shutdown can send the "This location is now defunct" message
> itself, execve seems to be extra just complications for it.

Yeah

> For daemon crashes, perhaps the next run, after upstart/etc restarts
> the daemon, can somehow convince others proactively that the new
> osd.42 is better than the old osd.42. That sounds like a good feature
> to have..

That much we already have, but startup/restart can take a while.  
sysvinit doesn't do auto-restart, though, and it would be nice not to rely 
on it in upstart/whatever.  

I can also imagine a scenario where we don't have auto-restart but do want 
fast failure notification...

sage