Re: Monitor Restart triggers half of our OSDs marked down

Sage Weil <sage@xxxxxxxxxxxx> · Thu, 5 Feb 2015 00:54:59 -0800 (PST)

On Thu, 5 Feb 2015, Dan van der Ster wrote:
> Hi,
> We also have seen this once after upgrading to 0.80.8 (from dumpling).
> Last week we had a network outage which marked out around 1/3rd of our
> OSDs. The outage lasted less than a minute -- all the OSDs were
> brought up once the network was restored.
> 
> Then 30 minutes later I restarted one monitor to roll out a small
> config change (changing leveldb log path). Surprisingly that resulted
> in many OSDs (but seemingly fewer than before) being marked out again
> then quickly marked in again.

Did the 'wrongly marked down' messages appear in ceph.log?

> I only have the lowest level logs from this incident -- but I think
> it's easily reproducable.

Logs with debug ms = 1 and debug mon = 20 would be best if someone is able 
to reproduce this.  I think our QA wouldn't capture this case because when 
we are either thrashing monitors and not OSDs (so OSD is never marked 
down) or we are thrashing OSDs and not mons (and ignoring 
wrongly-marked-down events).

sage

> 
> Cheers, Dan
> 
> 
> On Wed, Feb 4, 2015 at 12:06 PM, Christian Eichelmann
> <christian.eichelmann@xxxxxxxx> wrote:
> > Hi Greg,
> >
> > the behaviour is indeed strange. Today I was trying to reproduce the
> > problem, but no matter which monitor I've restarted, no matter how many
> > times, the bahviour was like expected: A new monitor election was called
> > and everything contiuned normally.
> >
> > Then I continued my failover tests and simulated the failure of two
> > racks with iptables (for us: 2 MON and & 6 OSD Server with in sum 360 OSDs)
> >
> > Afterwards I tried again to restart one monitor and again about 240 OSDs
> > got marked as down.
> >
> > There was no load on our monitor servers in that period. On one of the
> > OSDs which got marked down I found lot's of those messages:
> >
> > 2015-02-04 11:55:22.788245 7fc48fa48700  0 -- 10.76.70.4:6997/17094790
> >>> 10.76.70.8:6806/3303244 pipe(0x7a1b600 sd=198 :59766 s=2 pgs=1353
> > cs=1 l=0 c=0x4e562c0).fault with nothing to send, going to standby
> > 2015-02-04 11:55:22.788371 7fc48be0c700  0 -- 10.76.70.4:6997/17094790
> >>> 10.76.70.8:6842/12012876 pipe(0x895e840 sd=188 :49283 s=2 pgs=36873
> > cs=1 l=0 c=0x13226f20).fault with nothing to send, going to standby
> > 2015-02-04 11:55:22.788458 7fc494e9c700  0 -- 10.76.70.4:6997/17094790
> >>> 10.76.70.13:6870/13021609 pipe(0x13ace2c0 sd=117 :64130 s=2 pgs=38011
> > cs=1 l=0 c=0x52b4840).fault with nothing to send, going to standby
> > 2015-02-04 11:55:22.797107 7fc46459d700  0 -- 10.76.70.4:0/94790 >>
> > 10.76.70.11:6980/37144571 pipe(0xba0c580 sd=30 :0 s=1 pgs=0 cs=0 l=1
> > c=0x4e51600).fault
> > 2015-02-04 11:55:22.799350 7fc482d7d700  0 -- 10.76.70.4:6997/17094790
> >>> 10.76.70.10:6887/30410592 pipe(0x6a0cb00 sd=271 :53090 s=2 pgs=15372
> > cs=1 l=0 c=0xf3a6f20).fault with nothing to send, going to standby
> > 2015-02-04 11:55:22.800018 7fc46429a700  0 -- 10.76.70.4:0/94790 >>
> > 10.76.28.41:7076/37144571 pipe(0xba0c840 sd=59 :0 s=1 pgs=0 cs=0 l=1
> > c=0xf339760).fault
> > 2015-02-04 11:55:22.803086 7fc482272700  0 -- 10.76.70.4:6997/17094790
> >>> 10.76.70.5:6867/17011547 pipe(0x12f998c0 sd=294 :6997 s=2 pgs=46095
> > cs=1 l=0 c=0x8382000).fault with nothing to send, going to standby
> > 2015-02-04 11:55:22.804736 7fc4892e1700  0 -- 10.76.70.4:6997/17094790
> >>> 10.76.70.13:6852/9142109 pipe(0x12fa5b80 sd=163 :57056 s=2 pgs=45269
> > cs=1 l=0 c=0x189d1600).fault with nothing to send, going to standby
> >
> > The IPs mentioned there are all OSD Server.
> >
> > For me it feels like the monitors still have some "memory" about the
> > failed OSDs and something is happening when one of the goes down. If I
> > can provide you any more information to clarify the issue, just tell me
> > what you need.
> >
> > Regards,
> > Christian
> >
> > Am 03.02.2015 18:10, schrieb Gregory Farnum:
> >> On Tue, Feb 3, 2015 at 3:38 AM, Christian Eichelmann
> >> <christian.eichelmann@xxxxxxxx> wrote:
> >>> Hi all,
> >>>
> >>> during some failover tests and some configuration tests, we currently
> >>> discover a strange phenomenon:
> >>>
> >>> Restarting one of our monitors (5 in sum) triggers about 300 of the
> >>> following events:
> >>>
> >>> osd.669 10.76.28.58:6935/149172 failed (20 reports from 20 peers after
> >>> 22.005858 >= grace 20.000000)
> >>>
> >>> The osds come back up shortly after the have been marked down. What I
> >>> don't understand is: How can a restart of one monitor prevent the osds
> >>> from talking to each other and marking them down?
> >>>
> >>> FYI:
> >>> We are currently using the following settings:
> >>> mon osd adjust hearbeat grace = false
> >>> mon osd min down reporters = 20
> >>> mon osd adjust down out interval = false
> >>
> >> That's really strange. I think maybe you're seeing some kind of
> >> secondary effect; what kind of CPU usage are you seeing on the
> >> monitors during this time? Have you checked the log on any OSDs which
> >> have been marked down?
> >>
> >> I have a suspicion that maybe the OSDs are detecting their failed
> >> monitor connection and not being able to reconnect to another monitor
> >> quickly enough, but I'm not certain what the overlaps are there.
> >> -Greg
> >>
> >
> >
> > --
> > Christian Eichelmann
> > Systemadministrator
> >
> > 1&1 Internet AG - IT Operations Mail & Media Advertising & Targeting
> > Brauerstraße 48 · DE-76135 Karlsruhe
> > Telefon: +49 721 91374-8026
> > christian.eichelmann@xxxxxxxx
> >
> > Amtsgericht Montabaur / HRB 6484
> > Vorstände: Henning Ahlert, Ralph Dommermuth, Matthias Ehrlich, Robert
> > Hoffmann, Markus Huhn, Hans-Henning Kettler, Dr. Oliver Mauss, Jan Oetjen
> > Aufsichtsratsvorsitzender: Michael Scheeren
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com