Hello Ceph-Users,
for about 3 weeks now I see batches of scrub errors on a 4 node Octopus
cluster:
# ceph health detail HEALTH_ERR 7 scrub errors; Possible data damage:
6 pgs inconsistent [ERR] OSD_SCRUB_ERRORS: 7 scrub errors [ERR]
PG_DAMAGED: Possible data damage: 6 pgs inconsistent pg 5.3 is
active+clean+inconsistent, acting [9,12,6] pg 5.4 is
active+clean+inconsistent, acting [15,17,18] pg 7.2 is
active+clean+inconsistent, acting [13,15,10] pg 7.9 is
active+clean+inconsistent, acting [5,19,4] pg 7.e is
active+clean+inconsistent, acting [1,15,20] pg 7.18 is
active+clean+inconsistent, acting [5,10,0]
this cluster only serves RADOSGW and it's a multisite master.
I already found another thread
(https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/message/LXMQSRNSCPS5YJMFXIS3K5NMROHZKDJU/),
but with no recent comments about such an issue.
In my case I am still seeing more scrub errors every few days. All those
inconsistencies are "omap_digest_mismatch" in the "zone.rgw.log" or
"zone.rgw.buckets.index" pool and are spread all across nodes and OSDs.
I already raised I bug ticket (https://tracker.ceph.com/issues/53663),
but am wondering if anybody of you ever observed something similar?
Traffic to and from the object storage seems totally fine and I can even
run a manual deep-scrub with no errors and then receive 3-4 errors the
next day.
Is there anything I could look into / collect when the next
inconsistency occurs?
Could there be any misconfiguration causing this?
Thanks and with kind regards
Christian
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx