Hello
I have a little ceph cluster with 3 nodes,
each with 3x1TB HDD and 1x240GB SSD. I created this
cluster after Luminous release, so all OSDs are
Bluestore. In my crush map I have two rules, one
targeting the SSDs and one targeting the HDDs. I
have 4 pools, one using the SSD rule and the others
using the HDD rule, three pools are size=3
min_size=2, one is size=2 min_size=1 (this one have
content that it's ok to lose)
In the last 3 month I'm having a strange
random problem. I planned my osd scrubs during the
night (osd scrub begin hour = 20, osd scrub end hour
= 7) when office is closed so there is low impact on
the users. Some mornings, when I ceph the cluster
health, I find:
HEALTH_ERR X scrub errors; Possible data damage: Y pgs inconsistent
OSD_SCRUB_ERRORS X scrub errors
PG_DAMAGED Possible data damage: Y pg inconsistent
X and Y sometimes are 1, sometimes 2.
I issue a ceph health detail, check the
damaged PGs, and run a ceph pg repair for the
damaged PGs, I get
instructing pg PG on osd.N to repair
PG are different, OSD that have to repair
PG is different, even the node hosting the OSD is
different, I made a list of all PGs and OSDs. This
morning is the most recent case:
> ceph health detail
HEALTH_ERR 2 scrub errors; Possible data damage: 2 pgs inconsistent
OSD_SCRUB_ERRORS 2 scrub errors
PG_DAMAGED Possible data damage: 2 pgs inconsistent
pg 13.65 is active+clean+inconsistent, acting [4,2,6]
pg 14.31 is active+clean+inconsistent, acting [8,3,1]
> ceph pg repair 13.65
instructing pg 13.65 on osd.4 to repair
(node-2)> tail /var/log/ceph/ceph-osd.4.log
2018-02-28 08:38:47.593447 7f112cf76700 0 log_channel(cluster) log [DBG] : 13.65 repair starts
2018-02-28 08:39:37.573342 7f112cf76700 0 log_channel(cluster) log [DBG] : 13.65 repair ok, 0 fixed
> ceph pg repair 14.31
instructing pg 14.31 on osd.8 to repair
(node-3)> tail /var/log/ceph/ceph-osd.8.log
2018-02-28 08:52:37.297490 7f4dd0816700 0 log_channel(cluster) log [DBG] : 14.31 repair starts
2018-02-28 08:53:00.704020 7f4dd0816700 0 log_channel(cluster) log [DBG] : 14.31 repair ok, 0 fixed
I made a list of when I got
OSD_SCRUB_ERRORS, what PG and what OSD had to repair
PG. Date is dd/mm/yyyy
21/12/2017 -- pg 14.29 is active+clean+inconsistent, acting [6,2,4]
18/01/2018 -- pg 14.5a is active+clean+inconsistent, acting [6,4,1]
22/01/2018 -- pg 9.3a is active+clean+inconsistent, acting [2,7]
29/01/2018 -- pg 13.3e is active+clean+inconsistent, acting [4,6,1]
instructing pg 13.3e on osd.4 to repair
07/02/2018 -- pg 13.7e is active+clean+inconsistent, acting [8,2,5]
instructing pg 13.7e on osd.8 to repair
09/02/2018 -- pg 13.30 is active+clean+inconsistent, acting [7,3,2]
instructing pg 13.30 on osd.7 to repair
15/02/2018 -- pg 9.35 is active+clean+inconsistent, acting [1,8]
instructing pg 9.35 on osd.1 to repair
pg 13.3e is active+clean+inconsistent, acting [4,6,1]
instructing pg 13.3e on osd.4 to repair
17/02/2018 -- pg 9.2d is active+clean+inconsistent, acting [7,5]
instructing pg 9.2d on osd.7 to repair
22/02/2018 -- pg 9.24 is active+clean+inconsistent, acting [5,8]
instructing pg 9.24 on osd.5 to repair
28/02/2018 -- pg 13.65 is active+clean+inconsistent, acting [4,2,6]
instructing pg 13.65 on osd.4 to repair
pg 14.31 is active+clean+inconsistent, acting [8,3,1]
instructing pg 14.31 on osd.8 to repair
If can be useful, my ceph.conf is here:
[global]
auth client required = none
auth cluster required = none
auth service required = none
fsid = 24d5d6bc-0943-4345-b44e-46c19099004b
cluster network = 10.10.10.0/24
public network = 10.10.10.0/24
keyring = /etc/pve/priv/$cluster.$name.keyring
mon allow pool delete = true
osd journal size = 5120
osd pool default min size = 2
osd pool default size = 3
bluestore_block_db_size = 64424509440
debug asok = 0/0
debug auth = 0/0
debug buffer = 0/0
debug client = 0/0
debug context = 0/0
debug crush = 0/0
debug filer = 0/0
debug filestore = 0/0
debug finisher = 0/0
debug heartbeatmap = 0/0
debug journal = 0/0
debug journaler = 0/0
debug lockdep = 0/0
debug mds = 0/0
debug mds balancer = 0/0
debug mds locker = 0/0
debug mds log = 0/0
debug mds log expire = 0/0
debug mds migrator = 0/0
debug mon = 0/0
debug monc = 0/0
debug ms = 0/0
debug objclass = 0/0
debug objectcacher = 0/0
debug objecter = 0/0
debug optracker = 0/0
debug osd = 0/0
debug paxos = 0/0
debug perfcounter = 0/0
debug rados = 0/0
debug rbd = 0/0
debug rgw = 0/0
debug throttle = 0/0
debug timer = 0/0
debug tp = 0/0
[osd]
keyring = /var/lib/ceph/osd/ceph-$id/keyring
osd max backfills = 1
osd recovery max active = 1
osd scrub begin hour = 20
osd scrub end hour = 7
osd scrub during recovery = false
osd scrub load threshold = 0.3
[client]
rbd cache = true
rbd cache size = 268435456 # 256MB
rbd cache max dirty = 201326592 # 192MB
rbd cache max dirty age = 2
rbd cache target dirty = 33554432 # 32MB
rbd cache writethrough until flush = true
#[mgr]
#debug_mgr = 20
[mon.pve-hs-main]
host = pve-hs-main
mon addr = 10.10.10.251:6789
[mon.pve-hs-2]
host = pve-hs-2
mon addr = 10.10.10.252:6789
[mon.pve-hs-3]
host = pve-hs-3
mon addr = 10.10.10.253:6789
My ceph versions:
{
"mon": {
"ceph version 12.2.2 (215dd7151453fae88e6f968c975b6ce309d42dcf) luminous (stable)": 3
},
"mgr": {
"ceph version 12.2.2 (215dd7151453fae88e6f968c975b6ce309d42dcf) luminous (stable)": 3
},
"osd": {
"ceph version 12.2.2 (215dd7151453fae88e6f968c975b6ce309d42dcf) luminous (stable)": 12
},
"mds": {},
"overall": {
"ceph version 12.2.2 (215dd7151453fae88e6f968c975b6ce309d42dcf) luminous (stable)": 18
}
}
My ceph osd tree:
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 8.93686 root default
-6 2.94696 host pve-hs-2
3 hdd 0.90959 osd.3 up 1.00000 1.00000
4 hdd 0.90959 osd.4 up 1.00000 1.00000
5 hdd 0.90959 osd.5 up 1.00000 1.00000
10 ssd 0.21819 osd.10 up 1.00000 1.00000
-3 2.86716 host pve-hs-3
6 hdd 0.85599 osd.6 up 1.00000 1.00000
7 hdd 0.85599 osd.7 up 1.00000 1.00000
8 hdd 0.93700 osd.8 up 1.00000 1.00000
11 ssd 0.21819 osd.11 up 1.00000 1.00000
-7 3.12274 host pve-hs-main
0 hdd 0.96819 osd.0 up 1.00000 1.00000
1 hdd 0.96819 osd.1 up 1.00000 1.00000
2 hdd 0.96819 osd.2 up 1.00000 1.00000
9 ssd 0.21819 osd.9 up 1.00000 1.00000
My pools:
pool 9 'cephbackup' replicated size 2 min_size 1 crush_rule 1 object_hash rjenkins pg_num 64 pgp_num 64 last_change 5665 flags hashpspool stripe_width 0 application rbd
removed_snaps [1~3]
pool 13 'cephwin' replicated size 3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 128 pgp_num 128 last_change 16454 flags hashpspool stripe_width 0 application rbd
removed_snaps [1~5]
pool 14 'cephnix' replicated size 3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 128 pgp_num 128 last_change 16482 flags hashpspool stripe_width 0 application rbd
removed_snaps [1~227]
pool 17 'cephssd' replicated size 3 min_size 2 crush_rule 2 object_hash rjenkins pg_num 64 pgp_num 64 last_change 8601 flags hashpspool stripe_width 0 application rbd
removed_snaps [1~3]
I can't understand where the problem comes
from, I don't think it's hardware, if I had a failed
disk, then I should have problems always on the same
OSD. Any ideas
Thanks