Hi,
when upgrading my cluster from 10.2.3 to 10.2.6 I've faced a major failure and I think it could(?) be a bug.
My SO is Ubuntu (Xenial), Ceph packages are also from distro. My cluster have 3 monitors and 96 OSDs.
First I stoped one mon, then upgrade SO packages, reboot, it came back on as expected with no failures. Did the same with another mon, OK too, but when I stopped my last mon a HEALTH_ERR, tons of blocked requests and several minutes (with almost zero client I/O) until the recovery process starts...
Two days latter (with an inconvinient performance degradation) the cluster became HEALTH_OK again, just then I upgraded all my OSDs from 10.2.3 to 10.2.6 (this time fortunately without any surprises).
My question is: why this happened?
In my logs I can see only (from monitor booting process) things like:
2017-03-27 11:21:13.955155 7f7b24df3700 0 mon.mon-node1@-1(probing).osd e166803 crush map has features 288514051259236352, adjusting msgr requires
2017-03-27 11:21:14.020915 7f7b16a10700 0 -- 10.2.15.20:6789/0 >> 10.2.15.22:6789/0 pipe(0x55eeea485400 sd=12 :49238 s=2 pgs=3041041 cs=1 l=0 c=0x55eee9206c00).reader missed message? skipped from seq 0 to 821720064
2017-03-27 11:21:14.021322 7f7b1690f700 0 -- 10.2.15.20:6789/0 >> 10.2.15.21:6789/0 pipe(0x55eeea484000 sd=11 :44714 s=2 pgs=6749444 cs=1 l=0 c=0x55eee9206a80).reader missed message? skipped from seq 0 to 1708671746
And also (from all my OSDs) a lot of:
2017-03-27 11:21:46.991533 osd.62 10.2.15.37:6812/4072 21935 : cluster [WRN] failed to encode map e167847 with expected crc
When things started to goes wrong (when I stopped mon-node1 to upgrade, the last one) I can see:
2017-03-27 11:05:07.143529 mon.1 10.2.15.21:6789/0 653 : cluster [INF]
+HEALTH_ERR; 54 pgs are stuck inactive for more than 300 seconds; 2153 pgs backfill_wait; 21 pgs
+backfilling; 53 pgs degraded; 2166 pgs peering; 3 pgs recovering; 50 pgs recovery_wait; 54 pgs stuck
+inactive; 118 pgs stuck unclean; 1549 requests are blocked > 32 sec; recovery 28926/57075284 objects
+degraded (0.051%); recovery 24971455/57075284 objects misplaced (43.752%); all OSDs are running jewel or
+later but the 'require_jewel_osds' osdmap flag is not set; 1 mons down, quorum 1,2 mon-node2,mon-node3
And when mon-node1 came back (already upgraded):
2017-03-27 11:21:58.987092 7f7b18c16700 0 log_channel(cluster) log [INF] : mon.mon-node1 calling new monitor election
2017-03-27 11:21:58.987186 7f7b18c16700 1 mon.mon-node1@0(electing).elector(162) init, last seen epoch 162
2017-03-27 11:21:59.064957 7f7b18c16700 0 log_channel(cluster) log [INF] : mon.mon-node1 calling new monitor election
2017-03-27 11:21:59.065029 7f7b18c16700 1 mon.mon-node1@0(electing).elector(165) init, last seen epoch 165
2017-03-27 11:21:59.096933 7f7b18c16700 0 log_channel(cluster) log [INF] : mon.mon-node1@0 won leader election with quorum 0,1,2
2017-03-27 11:21:59.114194 7f7b18c16700 0 log_channel(cluster) log [INF] : HEALTH_ERR; 2167 pgs are stuck inactive for more than 300 seconds; 2121 pgs backfill_wait; 25 pgs backfilling; 25 pgs degraded; 2147 pgs peering; 25 pgs recovery_wait; 25 pgs stuck degraded; 2167 pgs stuck inactive; 4338 pgs stuck unclean; 5082 requests are blocked > 32 sec; recovery 11846/55732755 objects degraded (0.021%); recovery 24595033/55732755 objects misplaced (44.130%); all OSDs are running jewel or later but the 'require_jewel_osds' osdmap flag is not set
crc errors disappeared when all monitors were upgraded and require_jewel_osds flag was set too.
It seems that the entire cluster was rebuilded, fortunately I didn't lose any data.
So is it a bug, expected behavior or I did something wrong? I've updated Ceph several times and never had problems.
Herbert
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com