RE: Ceph watchdog-like thing to reduce IO block during process goes down by abort()

"Igor.Podoski@xxxxxxxxxxxxxx" <Igor.Podoski@xxxxxxxxxxxxxx> · Thu, 24 Mar 2016 11:47:20 +0000

> -----Original Message-----
> From: Haomai Wang [mailto:haomaiwang@xxxxxxxxx]
> Sent: Thursday, March 24, 2016 10:48 AM
> To: Podoski, Igor
> Cc: ceph-devel
> Subject: Re: Ceph watchdog-like thing to reduce IO block during process goes
> down by abort()
> 
> 
> 
> On Thu, Mar 24, 2016 at 3:00 PM, Igor.Podoski@xxxxxxxxxxxxxx
> <Igor.Podoski@xxxxxxxxxxxxxx> wrote:
> > Hi Cephers!
> >
> > Currently when we had a disk failure, assert() and then abort() was
> triggered and process was killed (ABRT). Other osds will eventually mark
> dead one as down, but it depends of heartbeat settings and monitor settings
> (mon_osd_min_down_reporters/mon_osd_min_down_reports). During
> dead-not-marked-as-down osd you can see blocked IO during writes and
> reads.
> >
> > Recently I've made https://github.com/ceph/ceph/pull/7740 which is
> about sending MakrMeDown msg to monitor just before osd is going bye-
> bye. It prevents blocked IO in above case, and any other assert that is not on
> message sending path, so I need messenger/pipes/connections working for
> this. I've made some test and it looks good, when I pull out drive from my
> cluster during rados bench, IO blocks for less than 1 second or not at all,
> previously it was > 10 sec (on my cluster settings).
> >
> > Sage pointed me that some time ago was similar PR
> https://github.com/ceph/ceph/pull/6514 and there was a thought about
> ceph-watchdog process, that could monitor osd's and send info directly to
> monitor when they disappear. This would prevent all assert() cases, and
> other ones like kill -9 or similar.
> >
> > I have a few ideas how such functionality could be implemented, so my
> question is - does any of you started already doing something similar?
> >
> > Let's have a brain storm about it!
> >
> > Ideas about improving 7740/6514 MarkMeDown internal mechanism:
> > - I think, I could send message with MarkMeDown payload, but in a raw
> way, not through Messenger path. This could be as good as bad in this case.
> 
> I think we still go though msgr path or a specfied api? urgent or out-of-bound
> flag?

But we need messenger instance in a good shape for sending MarkMeDown, if we got assert in the core of msgr, we will fail to send anything.

> 
> > - I could poke osd-neighbor through signal and neighbor will send
> Mark(SignalSender)Down message (this won't work If whole hdd controller
> will be down, all osd will be dead in narrow time window). So it's like instant
> bad-health heartbeat message. Still depends of Messenger send path of
> osd-neighbor.
> 
> >
> > External ceph-watchdog:
> > Just like Sage wrote
> https://github.com/ceph/ceph/pull/6514#issuecomment-159372845 Or
> similar: each osd, during start passes its own PID to ceph-watchdog process
> through shared memory/socket/named pipe (whatever). Ceph-watchdog
> checks if current PID exists, by checking changes in /proc/PID or
> /proc/PID/cmd directory/file (maybe Inotify could handle this). When file or
> folder is changed(missing) it sends MarkThisOsdDown to monitor and that's
> all. But this won't be watchdog strict, rather process down notify.
> 
> it looks a little complexity and redundant, we need to manage the lifecycle of
> watchdog itself, and it's much like systemd...

Ok, so back to slightly modified Sage idea:

Osd before abort() could write its ID (from **argv) to ceph-watchdog named pipe. Only one could be hazard here - case when all osd's want to notify watchdog in the same time. As I wrote before it would not be a 'watchdog' process, but 'process down notify', so question is do we need watchdog like thing for some other stuff (in the feature) or process down notify will be sufficient?

> 
> >
> > Or maybe both ways PR7740 + external ?
> >
> > Regards,
> > Igor.
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel"
> > in the body of a message to majordomo@xxxxxxxxxxxxxxx More
> majordomo
> > info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> 
> --
> 
> 
> Best Regards,
> 
> Wheat

Regards,
Igor.
��.n��������+%������w��{.n����z��u���ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f