ceph-objectstore-tool core dump

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I recently started getting inconsistent PGs in my Octopus (15.2.14) ceph cluster. I was able to determine that they are all coming from the same OSD: osd.143. This host recently suffered from an unplanned power loss, so I'm not surprised that there may be some corruption. This PG is part of a EC 8+2 pool.

The OSD logs from the PG's primary OSD show this and similar errors from the PG's most recent deep scrub:

2021-10-03T03:25:25.969-0500 7f6e6801f700 -1 log_channel(cluster) log [ERR] : 23.1fa shard 143(1) soid 23:5f8c3d4e:::10000179969.00000168:head : candidate had a read error

In attempting to fix it, I first ran 'ceph pg repair 23.1fa' on the PG. This accomplished nothing. Next I ran a shallow fsck on the OSD:

# ceph-bluestore-tool fsck --path /var/lib/ceph/osd/ceph-143
fsck success

I estimated that a deep fsck will take ~24 hours to run on this mostly full 16TB HDD. Before doing that, I wanted to see if I could simply remove the offending object and let ceph recover itself. Unfortunately, ceph-objectstore-tool core dumps when I try to remove this object:

# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-143 --pgid 23.1fa '{"oid":"10000179969.00000168","key":"","snapid":-2,"hash":1924936186,"max":0,"pool":23,"namespace":"","shard_id":1,"max":0}' remove
*** Caught signal (Segmentation fault) **
 in thread 7fdc491a88c0 thread_name:ceph-objectstor
ceph version 15.2.14 (cd3bb7e87a2f62c1b862ff3fd8b1eec13391a5be) octopus (stable)
 1: (()+0xf630) [0x7fdc3e62a630]
 2: (__pthread_rwlock_rdlock()+0xb) [0x7fdc3e62614b]
3: (BlueStore::collection_bits(boost::intrusive_ptr<ObjectStore::CollectionImpl>&)+0x148) [0x5583c8fa7878]
 4: (main()+0x4b50) [0x5583c8a85270]
 5: (__libc_start_main()+0xf5) [0x7fdc3cfe7555]
 6: (()+0x39d3a0) [0x5583c8ab03a0]
Segmentation fault (core dumped)

As a last resort, I know that I can map this OID back to the cephfs file and simply remove/restore the offending file to fix the object. But before I do that, I'm running a deep fsck to see if that can fix this and the other inconsistent objects. In the meantime, I wondered if there was anything else I could do to clean up this inconsistent PG?

--Mike
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux