Sounds similar to this one: https://tracker.ceph.com/issues/46847 If you have or can reconstruct the crush map from before adding the OSDs, you might be able to discover everything with the temporary reversal of the crush map method. Not sure if there is another method, i never got a reply to my question in the tracker. Best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14 ________________________________________ From: Michael Thomas <wart@xxxxxxxxxxx> Sent: 16 September 2020 01:27:19 To: ceph-users@xxxxxxx Subject: multiple OSD crash, unfound objects Over the weekend I had multiple OSD servers in my Octopus cluster (15.2.4) crash and reboot at nearly the same time. The OSDs are part of an erasure coded pool. At the time the cluster had been busy with a long-running (~week) remapping of a large number of PGs after I incrementally added more OSDs to the cluster. After bringing all of the OSDs back up, I have 25 unfound objects and 75 degraded objects. There are other problems reported, but I'm primarily concerned with these unfound/degraded objects. The pool with the missing objects is a cephfs pool. The files stored in the pool are backed up on tape, so I can easily restore individual files as needed (though I would not want to restore the entire filesystem). I tried following the guide at https://docs.ceph.com/docs/octopus/rados/troubleshooting/troubleshooting-pg/#unfound-objects. I found a number of OSDs that are still 'not queried'. Restarting a sampling of these OSDs changed the state from 'not queried' to 'already probed', but that did not recover any of the unfound or degraded objects. I have also tried 'ceph pg deep-scrub' on the affected PGs, but never saw them get scrubbed. I also tried doing a 'ceph pg force-recovery' on the affected PGs, but only one seems to have been tagged accordingly (see ceph -s output below). The guide also says "Sometimes it simply takes some time for the cluster to query possible locations." I'm not sure how long "some time" might take, but it hasn't changed after several hours. My questions are: * Is there a way to force the cluster to query the possible locations sooner? * Is it possible to identify the files in cephfs that are affected, so that I could delete only the affected files and restore them from backup tapes? --Mike ceph -s: cluster: id: 066f558c-6789-4a93-aaf1-5af1ba01a3ad health: HEALTH_ERR 1 clients failing to respond to capability release 1 MDSs report slow requests 25/78520351 objects unfound (0.000%) 2 nearfull osd(s) Reduced data availability: 1 pg inactive Possible data damage: 9 pgs recovery_unfound Degraded data redundancy: 75/626645098 objects degraded (0.000%), 9 pgs degraded 1013 pgs not deep-scrubbed in time 1013 pgs not scrubbed in time 2 pool(s) nearfull 1 daemons have recently crashed 4 slow ops, oldest one blocked for 77939 sec, daemons [osd.0,osd.41] have slow ops. services: mon: 4 daemons, quorum ceph1,ceph2,ceph3,ceph4 (age 9d) mgr: ceph3(active, since 11d), standbys: ceph2, ceph4, ceph1 mds: archive:1 {0=ceph4=up:active} 3 up:standby osd: 121 osds: 121 up (since 6m), 121 in (since 101m); 4 remapped pgs task status: scrub status: mds.ceph4: idle data: pools: 9 pools, 2433 pgs objects: 78.52M objects, 298 TiB usage: 412 TiB used, 545 TiB / 956 TiB avail pgs: 0.041% pgs unknown 75/626645098 objects degraded (0.000%) 135224/626645098 objects misplaced (0.022%) 25/78520351 objects unfound (0.000%) 2421 active+clean 5 active+recovery_unfound+degraded 3 active+recovery_unfound+degraded+remapped 2 active+clean+scrubbing+deep 1 unknown 1 active+forced_recovery+recovery_unfound+degraded progress: PG autoscaler decreasing pool 7 PGs from 1024 to 512 (5d) [............................] _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx