Re: 0.56.3 OSDs wrongly marked down and cluster unresponsiveness

Chris Dunlop <chris@xxxxxxxxxxxx> · Fri, 1 Mar 2013 12:13:16 +1100

On Thu, Feb 28, 2013 at 01:44:28PM -0800, Nick Bartos wrote:
> When a single high I/O event (in this case a cp of a 10G file on a
> filesystem mounted on an rbd) occurs, I'm having the 2 OSDs that
> reside on the same system where the rbd is mounted being marked down
> when it appears that they shouldn't be.  Additionally, other cluster
> services start timing out just after the OSDs are marked down (which
> appears to be mysql which is rbd backed becoming unresponsive).
> Previous to the OSDs being marked down things are running slow, but do
> not appear to actually be failing until they are actually marked down.
> 
> Here is an interesting snip in the logs:
> 
> Feb 28 21:12:11 172.17.0.13 ceph-mon: 2013-02-28 21:12:11.081003
> 7f377687d700  1 mon.0@0(leader).osd e14  we have enough
> reports/reporters to mark osd.2 down
> Feb 28 21:12:11 172.17.0.13 [  663.241832] libceph: osd2 down
> Feb 28 21:12:11 172.17.0.14 [  655.577185] libceph: osd2 down
> Feb 28 21:12:11 172.17.0.13 [  663.242064] libceph: osd5 down
> Feb 28 21:12:11 172.17.0.13 kernel: [  663.241832] libceph: osd2 down
> Feb 28 21:12:11 172.17.0.13 kernel: [  663.242064] libceph: osd5 down
> Feb 28 21:12:11 172.17.0.14 [  655.577434] libceph: osd5 down
> Feb 28 21:12:11 172.17.0.14 kernel: [  655.577185] libceph: osd2 down
> Feb 28 21:12:11 172.17.0.14 kernel: [  655.577434] libceph: osd5 down
> Feb 28 21:12:12 172.17.0.13 ceph-osd: 2013-02-28 21:12:12.423178 osd.5
> 172.17.0.13:6803/2015 126 : [WRN] map e16 wrongly marked me down
> Feb 28 21:12:12 172.17.0.13 ceph-osd: 2013-02-28 21:12:12.423177
> 7f4c10a0e700  0 log [WRN] : map e16 wrongly marked me down
> Feb 28 21:12:17 172.17.0.13 ceph-osd: 2013-02-28 21:12:17.208466
> 7f01aa894700  0 log [WRN] : map e16 wrongly marked me down
> Feb 28 21:12:17 172.17.0.13 ceph-osd: 2013-02-28 21:12:17.208468 osd.2
> 172.17.0.13:6800/1924 187 : [WRN] map e16 wrongly marked me down
> 
> The full log is available here:  http://download.pistoncloud.com/p/ceph-2.log.xz
> Note: The compressed log is only about 8MB, but uncompressed it's
> about 160MB.  I've added libceph and rbd kernel debugging in as well.

That looks like the same problem I'm having:

http://thread.gmane.org/gmane.comp.file-systems.ceph.devel/13136

tl;dr: sorry, no solutions yet, still trying to track this down.

Two possible hints...

Regarding the 'slow requests', check if your osd disks are running
at or close to capacity in bandwidth or IOPS when you have the i/o
spike. You could try reducing the i/o load, e.g. by putting journals
on separate media if you haven't already.

See if the problem alleviates if you do a "service ceph reload"
every 2 hours. Once I turned on "debug ms = 20" per the thread
above, to avoid running out of disk space for the logs I started
doing a logrotate for ceph (which does a ceph reload) every 2 hours.
In the week since that happened I haven't seen problem occur.

Cheers,

Chris
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com