Our Ceph cluster entered into that HEALTH_ERR last week. We’re running Infernalis and that was the first time I’ve seen it in that state. Even when OSD instances dropped off we’ve only seen HEALTH_WARN. The output of `ceph status` looks like this: [root@r01u02-b ~]# ceph status cluster ed62b3b9-be4a-4ce2-8cd3-34854aa8d6c2 health HEALTH_ERR 1 pgs inconsistent 1 scrub errors monmap e1: 3 mons at {r01u01-a=192.168.111.11:6789/0,r01u02-b=192.168.111.16:6789/0,r01u03-c=192.168.111.21:6789/0} election epoch 900, quorum 0,1,2 r01u01-a,r01u02-b,r01u03-c mdsmap e744: 1/1/1 up {0=r01u01-a=up:active}, 2 up:standby osdmap e533858: 48 osds: 48 up, 48 in flags sortbitwise pgmap v47571404: 3456 pgs, 14 pools, 16470 GB data, 18207 kobjects 33056 GB used, 56324 GB / 89381 GB avail 3444 active+clean 8 active+clean+scrubbing+deep 3 active+clean+scrubbing 1 active+clean+inconsistent client io 1535 kB/s wr, 23 op/s I tracked down the inconsistent PG and found that one of pair of OSDs had kernel log messages like these: [1773723.509386] sd 5:0:0:0: [sdb] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE [1773723.509390] sd 5:0:0:0: [sdb] Sense Key : Medium Error [current] [descriptor] [1773723.509394] sd 5:0:0:0: [sdb] Add. Sense: Unrecovered read error - auto reallocate failed [1773723.509398] sd 5:0:0:0: [sdb] CDB: Read(10) 28 00 01 4c 1b a0 00 00 08 00 [1773723.509401] blk_update_request: I/O error, dev sdb, sector 21765025 Replacing the disk on that OSD server eventually fix the problem, but it took a long time to get out of the error state: [root@r01u01-a ~]# ceph status cluster ed62b3b9-be4a-4ce2-8cd3-34854aa8d6c2 health HEALTH_ERR 61 pgs backfill 2 pgs backfilling 1 pgs inconsistent 1 pgs repair 63 pgs stuck unclean recovery 5/37908099 objects degraded (0.000%) recovery 1244055/37908099 objects misplaced (3.282%) 1 scrub errors monmap e1: 3 mons at {r01u01-a=192.168.111.11:6789/0,r01u02-b=192.168.111.16:6789/0,r01u03-c=192.168.111.21:6789/0} election epoch 920, quorum 0,1,2 r01u01-a,r01u02-b,r01u03-c mdsmap e759: 1/1/1 up {0=r01u02-b=up:active}, 2 up:standby osdmap e534536: 48 osds: 48 up, 48 in; 63 remapped pgs flags sortbitwise pgmap v47590337: 3456 pgs, 14 pools, 16466 GB data, 18205 kobjects 33085 GB used, 56295 GB / 89381 GB avail 5/37908099 objects degraded (0.000%) 1244055/37908099 objects misplaced (3.282%) 3385 active+clean 61 active+remapped+wait_backfill 6 active+clean+scrubbing+deep 2 active+remapped+backfilling 1 active+clean+scrubbing+deep+inconsistent+repair 1 active+clean+scrubbing client io 2720 kB/s wr, 16 op/s Here’s what I’m curious about: * How did a bad sector resulted in more damage to the Ceph cluster than a few downed OSD servers? * Is this issue addressed in later releases? I’m in the middle of setting up a Jewel instance. * What can be done to avoid the `HEALTH_ERR` state in similar failure scenarios? Increasing the default pool size from 2 to 3? Many thanks for any input/insight you may have. -kc K.C. Wong kcwong@xxxxxxxxxxx M: +1 (408) 769-8235 ----------------------------------------------------- Confidentiality Notice: This message contains confidential information. If you are not the intended recipient and received this message in error, any use or distribution is strictly prohibited. Please also notify us immediately by return e-mail, and delete this message from your computer system. Thank you. ----------------------------------------------------- 4096R/B8995EDE E527 CBE8 023E 79EA 8BBB 5C77 23A6 92E9 B899 5EDE hkps://hkps.pool.sks-keyservers.net
Attachment:
signature.asc
Description: Message signed with OpenPGP
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com