HEALTH_ERR resulted from a bad sector

"K.C. Wong" <kcwong@xxxxxxxxxxx> · Wed, 7 Feb 2018 16:35:29 -0800

Our Ceph cluster entered into that HEALTH_ERR last week.
We’re running Infernalis and that was the first time I’ve seen
it in that state. Even when OSD instances dropped off we’ve
only seen HEALTH_WARN. The output of `ceph status` looks
like this:

[root@r01u02-b ~]# ceph status
   cluster ed62b3b9-be4a-4ce2-8cd3-34854aa8d6c2
    health HEALTH_ERR
           1 pgs inconsistent
           1 scrub errors
    monmap e1: 3 mons at {r01u01-a=192.168.111.11:6789/0,r01u02-b=192.168.111.16:6789/0,r01u03-c=192.168.111.21:6789/0}
           election epoch 900, quorum 0,1,2 r01u01-a,r01u02-b,r01u03-c
    mdsmap e744: 1/1/1 up {0=r01u01-a=up:active}, 2 up:standby
    osdmap e533858: 48 osds: 48 up, 48 in
           flags sortbitwise
     pgmap v47571404: 3456 pgs, 14 pools, 16470 GB data, 18207 kobjects
           33056 GB used, 56324 GB / 89381 GB avail
               3444 active+clean
                  8 active+clean+scrubbing+deep
                  3 active+clean+scrubbing
                  1 active+clean+inconsistent
 client io 1535 kB/s wr, 23 op/s

I tracked down the inconsistent PG and found that one of pair
of OSDs had kernel log messages like these:

[1773723.509386] sd 5:0:0:0: [sdb] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[1773723.509390] sd 5:0:0:0: [sdb] Sense Key : Medium Error [current] [descriptor]
[1773723.509394] sd 5:0:0:0: [sdb] Add. Sense: Unrecovered read error - auto reallocate failed
[1773723.509398] sd 5:0:0:0: [sdb] CDB: Read(10) 28 00 01 4c 1b a0 00 00 08 00
[1773723.509401] blk_update_request: I/O error, dev sdb, sector 21765025

Replacing the disk on that OSD server eventually fix the
problem, but it took a long time to get out of the error
state:

[root@r01u01-a ~]# ceph status
   cluster ed62b3b9-be4a-4ce2-8cd3-34854aa8d6c2
    health HEALTH_ERR
           61 pgs backfill
           2 pgs backfilling
           1 pgs inconsistent
           1 pgs repair
           63 pgs stuck unclean
           recovery 5/37908099 objects degraded (0.000%)
           recovery 1244055/37908099 objects misplaced (3.282%)
           1 scrub errors
    monmap e1: 3 mons at {r01u01-a=192.168.111.11:6789/0,r01u02-b=192.168.111.16:6789/0,r01u03-c=192.168.111.21:6789/0}
           election epoch 920, quorum 0,1,2 r01u01-a,r01u02-b,r01u03-c
    mdsmap e759: 1/1/1 up {0=r01u02-b=up:active}, 2 up:standby
    osdmap e534536: 48 osds: 48 up, 48 in; 63 remapped pgs
           flags sortbitwise
     pgmap v47590337: 3456 pgs, 14 pools, 16466 GB data, 18205 kobjects
           33085 GB used, 56295 GB / 89381 GB avail
           5/37908099 objects degraded (0.000%)
           1244055/37908099 objects misplaced (3.282%)
               3385 active+clean
                 61 active+remapped+wait_backfill
                  6 active+clean+scrubbing+deep
                  2 active+remapped+backfilling
                  1 active+clean+scrubbing+deep+inconsistent+repair
                  1 active+clean+scrubbing
 client io 2720 kB/s wr, 16 op/s

Here’s what I’m curious about:

* How did a bad sector resulted in more damage to the Ceph
  cluster than a few downed OSD servers?
* Is this issue addressed in later releases? I’m in the
  middle of setting up a Jewel instance.
* What can be done to avoid the `HEALTH_ERR` state in similar
  failure scenarios? Increasing the default pool size from 2
  to 3?

Many thanks for any input/insight you may have.

-kc

K.C. Wong
kcwong@xxxxxxxxxxx
M: +1 (408) 769-8235

-----------------------------------------------------
Confidentiality Notice:
This message contains confidential information. If you are not the
intended recipient and received this message in error, any use or
distribution is strictly prohibited. Please also notify us
immediately by return e-mail, and delete this message from your
computer system. Thank you.
-----------------------------------------------------
4096R/B8995EDE  E527 CBE8 023E 79EA 8BBB  5C77 23A6 92E9 B899 5EDE
hkps://hkps.pool.sks-keyservers.net

Attachment:
signature.asc

Description: Message signed with OpenPGP
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com