I recently started getting inconsistent PGs in my Octopus (15.2.14) ceph
cluster. I was able to determine that they are all coming from the same
OSD: osd.143. This host recently suffered from an unplanned power loss,
so I'm not surprised that there may be some corruption. This PG is part
of a EC 8+2 pool.
The OSD logs from the PG's primary OSD show this and similar errors from
the PG's most recent deep scrub:
2021-10-03T03:25:25.969-0500 7f6e6801f700 -1 log_channel(cluster) log
[ERR] : 23.1fa shard 143(1) soid 23:5f8c3d4e:::10000179969.00000168:head
: candidate had a read error
In attempting to fix it, I first ran 'ceph pg repair 23.1fa' on the PG.
This accomplished nothing. Next I ran a shallow fsck on the OSD:
# ceph-bluestore-tool fsck --path /var/lib/ceph/osd/ceph-143
fsck success
I estimated that a deep fsck will take ~24 hours to run on this mostly
full 16TB HDD. Before doing that, I wanted to see if I could simply
remove the offending object and let ceph recover itself. Unfortunately,
ceph-objectstore-tool core dumps when I try to remove this object:
# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-143 --pgid
23.1fa
'{"oid":"10000179969.00000168","key":"","snapid":-2,"hash":1924936186,"max":0,"pool":23,"namespace":"","shard_id":1,"max":0}'
remove
*** Caught signal (Segmentation fault) **
in thread 7fdc491a88c0 thread_name:ceph-objectstor
ceph version 15.2.14 (cd3bb7e87a2f62c1b862ff3fd8b1eec13391a5be)
octopus (stable)
1: (()+0xf630) [0x7fdc3e62a630]
2: (__pthread_rwlock_rdlock()+0xb) [0x7fdc3e62614b]
3:
(BlueStore::collection_bits(boost::intrusive_ptr<ObjectStore::CollectionImpl>&)+0x148)
[0x5583c8fa7878]
4: (main()+0x4b50) [0x5583c8a85270]
5: (__libc_start_main()+0xf5) [0x7fdc3cfe7555]
6: (()+0x39d3a0) [0x5583c8ab03a0]
Segmentation fault (core dumped)
As a last resort, I know that I can map this OID back to the cephfs file
and simply remove/restore the offending file to fix the object. But
before I do that, I'm running a deep fsck to see if that can fix this
and the other inconsistent objects. In the meantime, I wondered if
there was anything else I could do to clean up this inconsistent PG?
--Mike
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx