Hi,
you wrote that this cluster was initially installed with Octopus, so
no upgrade ceph wise? Are all RGW daemons on the exact same ceph
(minor) versions?
I remember one of our customers reporting inconsistent objects on a
regular basis although no hardware issues were detectable. They
replicate between two sites, too. A couple of months ago both sites
were updated to the same exact ceph minor version (also Octopus), they
haven't faced inconsistencies since then. I don't have details about
the ceph version(s) though, only that both sites were initially
installed with Octopus. Maybe it's worth checking your versions?
Regards,
Eugen
Zitat von Christian Rohmann <christian.rohmann@xxxxxxxxx>:
Hello Ceph-Users,
for about 3 weeks now I see batches of scrub errors on a 4 node
Octopus cluster:
# ceph health detail HEALTH_ERR 7 scrub errors; Possible data
damage: 6 pgs inconsistent [ERR] OSD_SCRUB_ERRORS: 7 scrub errors
[ERR] PG_DAMAGED: Possible data damage: 6 pgs inconsistent pg
5.3 is active+clean+inconsistent, acting [9,12,6] pg 5.4 is
active+clean+inconsistent, acting [15,17,18] pg 7.2 is
active+clean+inconsistent, acting [13,15,10] pg 7.9 is
active+clean+inconsistent, acting [5,19,4] pg 7.e is
active+clean+inconsistent, acting [1,15,20] pg 7.18 is
active+clean+inconsistent, acting [5,10,0]
this cluster only serves RADOSGW and it's a multisite master.
I already found another thread
(https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/message/LXMQSRNSCPS5YJMFXIS3K5NMROHZKDJU/), but with no recent comments about such an
issue.
In my case I am still seeing more scrub errors every few days. All
those inconsistencies are "omap_digest_mismatch" in the
"zone.rgw.log" or "zone.rgw.buckets.index" pool and are spread all
across nodes and OSDs.
I already raised I bug ticket
(https://tracker.ceph.com/issues/53663), but am wondering if anybody
of you ever observed something similar?
Traffic to and from the object storage seems totally fine and I can
even run a manual deep-scrub with no errors and then receive 3-4
errors the next day.
Is there anything I could look into / collect when the next
inconsistency occurs?
Could there be any misconfiguration causing this?
Thanks and with kind regards
Christian
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx