Re: Monitor Restart triggers half of our OSDs marked down

Christian Eichelmann <christian.eichelmann@xxxxxxxx> · Thu, 05 Feb 2015 10:28:54 +0100

Am 05.02.2015 10:10, schrieb Dan van der Ster:
> 
> But then when I restarted the (peon) monitor:
> 
> 2015-01-29 11:29:18.250750 mon.0 128.142.35.220:6789/0 10570 : [INF]
> pgmap v35847068: 24608 pgs: 1 active+clean+scrubbing+deep, 24602
> active+clean, 5 active+clean+scrubbing; 125 T
> B data, 377 TB used, 2021 TB / 2399 TB avail; 193 MB/s rd, 238 MB/s
> wr, 7410 op/s
> 2015-01-29 11:29:28.844678 mon.3 128.142.39.77:6789/0 1 : [INF] mon.2
> calling new monitor election
> 2015-01-29 11:29:33.846946 mon.2 128.142.36.229:6789/0 9 : [INF] mon.4
> calling new monitor election
> 2015-01-29 11:29:33.847022 mon.4 128.142.39.144:6789/0 7 : [INF] mon.3
> calling new monitor election
> 2015-01-29 11:29:33.847085 mon.1 128.142.36.227:6789/0 24 : [INF]
> mon.1 calling new monitor election
> 2015-01-29 11:29:33.853498 mon.3 128.142.39.77:6789/0 2 : [INF] mon.2
> calling new monitor election
> 2015-01-29 11:29:33.895660 mon.0 128.142.35.220:6789/0 10860 : [INF]
> mon.0 calling new monitor election
> 2015-01-29 11:29:33.901335 mon.0 128.142.35.220:6789/0 10861 : [INF]
> mon.0@0 won leader election with quorum 0,1,2,3,4
> 2015-01-29 11:29:34.004028 mon.0 128.142.35.220:6789/0 10862 : [INF]
> monmap e5: 5 mons at
> {0=128.142.35.220:6789/0,1=128.142.36.227:6789/0,2=128.142.39.77:6789/0,3=128.142.39.144:6789/0,4=128.142.36.229:6789/0}
> 2015-01-29 11:29:34.005808 mon.0 128.142.35.220:6789/0 10863 : [INF]
> pgmap v35847069: 24608 pgs: 1 active+clean+scrubbing+deep, 24602
> active+clean, 5 active+clean+scrubbing; 125 TB data, 377 TB used, 2021
> TB / 2399 TB avail; 54507 kB/s rd, 85412 kB/s wr, 1967 op/s
> 2015-01-29 11:29:34.006111 mon.0 128.142.35.220:6789/0 10864 : [INF]
> mdsmap e157: 1/1/1 up {0=0=up:active}
> 2015-01-29 11:29:34.007165 mon.0 128.142.35.220:6789/0 10865 : [INF]
> osdmap e132055: 880 osds: 880 up, 880 in
> 2015-01-29 11:29:34.037367 mon.0 128.142.35.220:6789/0 11055 : [INF]
> osd.1202 128.142.23.104:6801/98353 failed (4 reports from 3 peers
> after 29.673699 >= grace 28.948726)
> 2015-01-29 11:29:34.050478 mon.0 128.142.35.220:6789/0 11139 : [INF]
> osd.1164 128.142.23.102:6850/22486 failed (3 reports from 2 peers
> after 30.685537 >= grace 28.946983)
> 
> 
> and then just after:
> 
> 2015-01-29 11:29:35.210184 osd.1202 128.142.23.104:6801/98353 59 :
> [WRN] map e132056 wrongly marked me down
> 2015-01-29 11:29:35.441922 osd.1164 128.142.23.102:6850/22486 25 :
> [WRN] map e132056 wrongly marked me down

The behaviour is exactly the same on our system, to it looks like the
same issue.
We are current running Giant by the way (0.87)

> plus many other OSDs like that.

-- 
Christian Eichelmann
Systemadministrator

1&1 Internet AG - IT Operations Mail & Media Advertising & Targeting
Brauerstraße 48 · DE-76135 Karlsruhe
Telefon: +49 721 91374-8026
christian.eichelmann@xxxxxxxx

Amtsgericht Montabaur / HRB 6484
Vorstände: Henning Ahlert, Ralph Dommermuth, Matthias Ehrlich, Robert
Hoffmann, Markus Huhn, Hans-Henning Kettler, Dr. Oliver Mauss, Jan Oetjen
Aufsichtsratsvorsitzender: Michael Scheeren
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com