> -----Original Message----- > From: Haomai Wang [mailto:haomaiwang@xxxxxxxxx] > Sent: Thursday, March 24, 2016 10:48 AM > To: Podoski, Igor > Cc: ceph-devel > Subject: Re: Ceph watchdog-like thing to reduce IO block during process goes > down by abort() > > > > On Thu, Mar 24, 2016 at 3:00 PM, Igor.Podoski@xxxxxxxxxxxxxx > <Igor.Podoski@xxxxxxxxxxxxxx> wrote: > > Hi Cephers! > > > > Currently when we had a disk failure, assert() and then abort() was > triggered and process was killed (ABRT). Other osds will eventually mark > dead one as down, but it depends of heartbeat settings and monitor settings > (mon_osd_min_down_reporters/mon_osd_min_down_reports). During > dead-not-marked-as-down osd you can see blocked IO during writes and > reads. > > > > Recently I've made https://github.com/ceph/ceph/pull/7740 which is > about sending MakrMeDown msg to monitor just before osd is going bye- > bye. It prevents blocked IO in above case, and any other assert that is not on > message sending path, so I need messenger/pipes/connections working for > this. I've made some test and it looks good, when I pull out drive from my > cluster during rados bench, IO blocks for less than 1 second or not at all, > previously it was > 10 sec (on my cluster settings). > > > > Sage pointed me that some time ago was similar PR > https://github.com/ceph/ceph/pull/6514 and there was a thought about > ceph-watchdog process, that could monitor osd's and send info directly to > monitor when they disappear. This would prevent all assert() cases, and > other ones like kill -9 or similar. > > > > I have a few ideas how such functionality could be implemented, so my > question is - does any of you started already doing something similar? > > > > Let's have a brain storm about it! > > > > Ideas about improving 7740/6514 MarkMeDown internal mechanism: > > - I think, I could send message with MarkMeDown payload, but in a raw > way, not through Messenger path. This could be as good as bad in this case. > > I think we still go though msgr path or a specfied api? urgent or out-of-bound > flag? But we need messenger instance in a good shape for sending MarkMeDown, if we got assert in the core of msgr, we will fail to send anything. > > > - I could poke osd-neighbor through signal and neighbor will send > Mark(SignalSender)Down message (this won't work If whole hdd controller > will be down, all osd will be dead in narrow time window). So it's like instant > bad-health heartbeat message. Still depends of Messenger send path of > osd-neighbor. > > > > > External ceph-watchdog: > > Just like Sage wrote > https://github.com/ceph/ceph/pull/6514#issuecomment-159372845 Or > similar: each osd, during start passes its own PID to ceph-watchdog process > through shared memory/socket/named pipe (whatever). Ceph-watchdog > checks if current PID exists, by checking changes in /proc/PID or > /proc/PID/cmd directory/file (maybe Inotify could handle this). When file or > folder is changed(missing) it sends MarkThisOsdDown to monitor and that's > all. But this won't be watchdog strict, rather process down notify. > > it looks a little complexity and redundant, we need to manage the lifecycle of > watchdog itself, and it's much like systemd... Ok, so back to slightly modified Sage idea: Osd before abort() could write its ID (from **argv) to ceph-watchdog named pipe. Only one could be hazard here - case when all osd's want to notify watchdog in the same time. As I wrote before it would not be a 'watchdog' process, but 'process down notify', so question is do we need watchdog like thing for some other stuff (in the feature) or process down notify will be sufficient? > > > > > Or maybe both ways PR7740 + external ? > > > > Regards, > > Igor. > > -- > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" > > in the body of a message to majordomo@xxxxxxxxxxxxxxx More > majordomo > > info at http://vger.kernel.org/majordomo-info.html > > > > -- > > > Best Regards, > > Wheat Regards, Igor. ��.n��������+%������w��{.n����z��u���ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f