Hi all,
I am currently battling inconsistent PGs after a far-reaching mistake
during the upgrade from Octopus to Pacific.
While otherwise following the guide, I restarted the Ceph MDS daemons
(and this started the Pacific daemons) without previously reducing the
ranks to 1 (from 2).
This resulted in daemons not coming up and reporting inconsistencies.
After later reducing the ranks and bringing the MDS back up (I did not
record every step as this was an emergency situation), we started seeing
health errors on every scrub.
Now after three weeks, while our CephFS is still working fine and we
haven't noticed any data damage, we realized that every single PG of the
cephfs metadata pool is affected.
Below you can find some information on the actual status and a detailed
inspection of one of the affected pgs. I am happy to provide any other
information that could be useful of course.
A repair of the affected PGs does not resolve the issue.
Does anyone else here have an idea what we could try apart from copying
all the data to a new CephFS pool?
Thank you!
Pascal
root@srv02:~# ceph status
cluster:
id: f0d6d4d0-8c17-471a-9f95-ebc80f1fee78
health: HEALTH_ERR
insufficient standby MDS daemons available
69262 scrub errors
Too many repaired reads on 2 OSDs
Possible data damage: 64 pgs inconsistent
services:
mon: 3 daemons, quorum srv02,srv03,srv01 (age 3w)
mgr: srv03(active, since 3w), standbys: srv01, srv02
mds: 2/2 daemons up, 1 hot standby
osd: 44 osds: 44 up (since 3w), 44 in (since 10M)
data:
volumes: 2/2 healthy
pools: 13 pools, 1217 pgs
objects: 75.72M objects, 26 TiB
usage: 80 TiB used, 42 TiB / 122 TiB avail
pgs: 1153 active+clean
55 active+clean+inconsistent
9 active+clean+inconsistent+failed_repair
io:
client: 2.0 MiB/s rd, 21 MiB/s wr, 240 op/s rd, 1.75k op/s wr
{
"epoch": 4962617,
"inconsistents": [
{
"object": {
"name": "1000000cc8e.00000000",
"nspace": "",
"locator": "",
"snap": 1,
"version": 4253817
},
"errors": [],
"union_shard_errors": [
"omap_digest_mismatch_info"
],
"selected_object_info": {
"oid": {
"oid": "1000000cc8e.00000000",
"key": "",
"snapid": 1,
"hash": 1369745244,
"max": 0,
"pool": 7,
"namespace": ""
},
"version": "4962847'6209730",
"prior_version": "3916665'4306116",
"last_reqid": "osd.27.0:757107407",
"user_version": 4253817,
"size": 0,
"mtime": "2022-02-26T12:56:55.612420+0100",
"local_mtime": "2022-02-26T12:56:55.614429+0100",
"lost": 0,
"flags": [
"dirty",
"omap",
"data_digest",
"omap_digest"
],
"truncate_seq": 0,
"truncate_size": 0,
"data_digest": "0xffffffff",
"omap_digest": "0xe5211a9e",
"expected_object_size": 0,
"expected_write_size": 0,
"alloc_hint_flags": 0,
"manifest": {
"type": 0
},
"watchers": {}
},
"shards": [
{
"osd": 20,
"primary": false,
"errors": [
"omap_digest_mismatch_info"
],
"size": 0,
"omap_digest": "0xffffffff",
"data_digest": "0xffffffff"
},
{
"osd": 27,
"primary": true,
"errors": [
"omap_digest_mismatch_info"
],
"size": 0,
"omap_digest": "0xffffffff",
"data_digest": "0xffffffff"
},
{
"osd": 43,
"primary": false,
"errors": [
"omap_digest_mismatch_info"
],
"size": 0,
"omap_digest": "0xffffffff",
"data_digest": "0xffffffff"
}
]
},
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx