Misplaced objects greater than 100%

Johan Hattne <johan@xxxxxxxxx> · Fri, 31 Mar 2023 09:08:38 -0700

Dear all;

Up until a few hours ago, I had a seemingly normally-behaving cluster 
(Quincy, 17.2.5) with 36 OSDs, evenly distributed across 3 of its 6 
nodes.  The cluster is only used for CephFS and the only non-standard 
configuration I can think of is that I had 2 active MDSs, but only 1 
standby.  I had also doubled mds_cache_memory limit to 8 GB (all OSD 
hosts have 256 G of RAM) at some point in the past.

Then I rebooted one of the OSD nodes.  The rebooted node held one of the 
active MDSs.  Now the node is back up: ceph -s says the cluster is 
healthy, but all PGs are in a active+clean+remapped state and 166.67% of 
the objects are misplaced (dashboard: -66.66% healthy).

The data pool is a threefold replica with 5.4M object,  the number of 
misplaced objects is reported as 27087410/16252446.  The denominator in 
the ratio makes sense to me (16.2M / 3 = 5.4M), but the numerator does 
not.  I also note that the ratio is *exactly* 5 / 3.  The filesystem is 
still mounted and appears to be usable, but df reports it as 100% full; 
I suspect it would say 167% but that is capped somewhere.

Any ideas about what is going on?  Any suggestions for recovery?

// Best wishes; Johan
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx