Safely identify objects that should be purged from a CephFS pool and manually purge

Dylan Mcculloch <dmc@xxxxxxxxxxxxxx> · Sat, 10 Mar 2018 07:58:51 +0000

Hi All,

Is it possible to safely identify objects that should be purged from a CephFS pool, and can we purge them manually?

Background:
ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)

We were running 2 MDS, 1 active & 1 standby-replay.
A couple of months ago, after triggering an MDS failover, we hit a purgequeue bug [1] which prevented either MDS from becoming active.
We followed steps in [2] to delete metadata objects in the purge queue and bring both of the MDS back online.

Today it became clear that the usage of a CephFS data pool was much higher than the usage shown by clients.
e.g. ls on client shows ~5.2TB used, while ceph fs status shows 146T used.

After reading bug report [3] (which appears to be related to bug reports [1] & [4]), we set 'mds standby replay = false' and restarted both MDS.
This appears to have stopped the persistent climb in usage on the OSDs, but usage remains critically high on several OSDs (~89%).

So, it looks like we have a problem with CephFS not recovering space and therefore have a large number of objects that need to be purged. Is there any possible method to do so safely?

Also possibly relevant:
I've been periodically running the following command throughout today:
rados -p <metadata_pool> ls | grep "^500\."
That command currently lists ~1670 metadata objects (500.XXXXXXXX), and the list of objects produced by that command is quite consistent.
i.e. about 1669 objects are the same each time.

[1]https://tracker.ceph.com/issues/21749
[2]http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-October/021386.html
[3]https://tracker.ceph.com/issues/21551
[4]https://tracker.ceph.com/issues/19593

Regards,
Dylan

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com