Re: osd shutdown notification

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Mar 26, 2012 at 13:16, Sage Weil <sage@xxxxxxxxxxxx> wrote:
>> Perhaps a separate executable that sends "osd.42 is now definitely
>> down" will be good enough? Hopefully you don't have two osd.42's
>> around, anyway. And if you want that, instead of execing ceph-osd, you
>> do a fork & exec, wait in the parent, then exec that thing that marks
>> it down. For upstart (and often for others too), there's a "after the
>> service exits" hook where we could also plug that in, if we wanted to.
>
> ...except that the way to reliably mark down a particular osd.42 requires
> data that's private to the ceph-osd instance, and unknown until it starts
> up and joins the cluster.  That makes it awkward to implement any kind of
> wrapper because you have to pass it a cookie using some side-channel.

Why do you need to know where osd.42 was last seen just to be able to
authoritatively claim osd.42 is 1) down 2) at a new location.

Think about power supply failure and hotswapping the disk to a new server.

> execv() in the signal handler, OTOH, is easy.  Is it that offensive?
>
> The other nice thing about that is that the failure notification can be
> informative for free: "osd.42 stopped: got SIGTERM", "osd.42 stopped:
> failed assert at foo.cc:1234", etc.

I'm worried about all the things that want us to exit(3). Code
coverage, valgrind, what not.

If you're thinking of shipping the crash reason to monitors, I think
you're once again trying to replace a bunch of a sysadmin's toolkit
with Ceph-internal features. Ones that they can't use with all the
non-Ceph things they run on their storage cluster anyway, like statsd,
sshd, etc. I feel confident in saying Ceph will lose that race, in
stability, functionality and familiarity to target audience.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux