Hi, I can't comment on the CephFS side but "Too many repaired reads on 2 OSDs" makes me suggest you check the hardware -- when I've seen that recently it was due to failing HDDs. I say "failing" not "failed" because the disks were giving errors on a few sectors but most I/O was working OK, so neither Linux nor Ceph ejected the disk; and repeated PG repair attempts were unsuccessful. Dave -- ** Dave Holland ** Systems Support -- Informatics Systems Group ** ** dh3@xxxxxxxxxxxx ** Wellcome Sanger Institute, Hinxton, UK ** On Wed, Jun 22, 2022 at 07:41:12PM +0200, Pascal Ehlert wrote: > Hi all, > > I am currently battling inconsistent PGs after a far-reaching mistake during > the upgrade from Octopus to Pacific. > While otherwise following the guide, I restarted the Ceph MDS daemons (and > this started the Pacific daemons) without previously reducing the ranks to 1 > (from 2). > > This resulted in daemons not coming up and reporting inconsistencies. > After later reducing the ranks and bringing the MDS back up (I did not > record every step as this was an emergency situation), we started seeing > health errors on every scrub. > > Now after three weeks, while our CephFS is still working fine and we haven't > noticed any data damage, we realized that every single PG of the cephfs > metadata pool is affected. > Below you can find some information on the actual status and a detailed > inspection of one of the affected pgs. I am happy to provide any other > information that could be useful of course. > > A repair of the affected PGs does not resolve the issue. > Does anyone else here have an idea what we could try apart from copying all > the data to a new CephFS pool? > > > > Thank you! > > Pascal > > > > > root@srv02:~# ceph status > cluster: > id: f0d6d4d0-8c17-471a-9f95-ebc80f1fee78 > health: HEALTH_ERR > insufficient standby MDS daemons available > 69262 scrub errors > Too many repaired reads on 2 OSDs > Possible data damage: 64 pgs inconsistent > > services: > mon: 3 daemons, quorum srv02,srv03,srv01 (age 3w) > mgr: srv03(active, since 3w), standbys: srv01, srv02 > mds: 2/2 daemons up, 1 hot standby > osd: 44 osds: 44 up (since 3w), 44 in (since 10M) > > data: > volumes: 2/2 healthy > pools: 13 pools, 1217 pgs > objects: 75.72M objects, 26 TiB > usage: 80 TiB used, 42 TiB / 122 TiB avail > pgs: 1153 active+clean > 55 active+clean+inconsistent > 9 active+clean+inconsistent+failed_repair > > io: > client: 2.0 MiB/s rd, 21 MiB/s wr, 240 op/s rd, 1.75k op/s wr > > > { > "epoch": 4962617, > "inconsistents": [ > { > "object": { > "name": "1000000cc8e.00000000", > "nspace": "", > "locator": "", > "snap": 1, > "version": 4253817 > }, > "errors": [], > "union_shard_errors": [ > "omap_digest_mismatch_info" > ], > "selected_object_info": { > "oid": { > "oid": "1000000cc8e.00000000", > "key": "", > "snapid": 1, > "hash": 1369745244, > "max": 0, > "pool": 7, > "namespace": "" > }, > "version": "4962847'6209730", > "prior_version": "3916665'4306116", > "last_reqid": "osd.27.0:757107407", > "user_version": 4253817, > "size": 0, > "mtime": "2022-02-26T12:56:55.612420+0100", > "local_mtime": "2022-02-26T12:56:55.614429+0100", > "lost": 0, > "flags": [ > "dirty", > "omap", > "data_digest", > "omap_digest" > ], > "truncate_seq": 0, > "truncate_size": 0, > "data_digest": "0xffffffff", > "omap_digest": "0xe5211a9e", > "expected_object_size": 0, > "expected_write_size": 0, > "alloc_hint_flags": 0, > "manifest": { > "type": 0 > }, > "watchers": {} > }, > "shards": [ > { > "osd": 20, > "primary": false, > "errors": [ > "omap_digest_mismatch_info" > ], > "size": 0, > "omap_digest": "0xffffffff", > "data_digest": "0xffffffff" > }, > { > "osd": 27, > "primary": true, > "errors": [ > "omap_digest_mismatch_info" > ], > "size": 0, > "omap_digest": "0xffffffff", > "data_digest": "0xffffffff" > }, > { > "osd": 43, > "primary": false, > "errors": [ > "omap_digest_mismatch_info" > ], > "size": 0, > "omap_digest": "0xffffffff", > "data_digest": "0xffffffff" > } > ] > }, > > > > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx -- The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx