Broken PG in cephfs data_pool (lost objects)

Francois Legrand <fleg@xxxxxxxxxxxxxx> · Mon, 8 Jun 2020 17:35:08 +0200

Hi all,
We have a cephfs with data_pool in erasure coding (3+2) ans 1024 pg 
(nautilus 14.2.8).
One of the pgs is partially destroyed (we lost 3 osd thus 3 shards), it 
have 143 objects unfound and is stuck in state 
"active+recovery_unfound+undersized+degraded+remapped".
We then lost some datas (we are using cephfs-data-scan pg_files... to 
identify files with data on the bad pg) .
We thus created a new filesystem (this time with data_pool in replica 3) 
and we are copying all the datas from the broken FS to the new one.
But we need to remove files from the broken FS after copy to free space 
(because there will not be enough space on the cluster). To avoid 
problems of strays we removed the snapshots on the broken FS before 
deleting files.
The point is that the mds managing the broken FS is now "Behind on 
trimming (123036/128) max_segments: 128, num_segments: 123036"
and have 1 slow metadata IOs are blocked > 30 secs, oldest blocked for 
83645 secs.
The slow IO correspond to osd 27 which is acting_primary for the broken 
PG. and the broken pg have a long "snap_trimq": 
"[1e0c~1,1e0e~1,1e12~1,1e16~1,1e18~1,1e1a~1,........" and 
"snap_trimq_len": 460.
It then seems that cephfs is not able to trim ops corresponding to the 
deletion of objects and snaps which have data on the broken PG, probably 
because the pg is not healty.

Is there a way to tell ceph/cephfs to flush or forget about (only) lost 
objects on the broken pg and get this pg healty enough to perform 
trimming ?
thanks for your help
F.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx