Hello, I'am working for BELNET the Belgian Natioanal Research Network We currently a manage a luminous ceph cluster on ubuntu 16.04 with 144 hdd osd spread across two data centers with 6 osd nodes on each datacenter. Osd(s) are 4 TB sata disk. Last week we had a network incident and the link between our 2 DC begin to flap due top spt flap. This let our ceph cluster in a very bad state with many pg stuck in different state. I let the cluster the time to recover , but some osd doesn't restart. I have read and try different stuff found in this mailing list but this had the effect to be in worst situation because all my osds began to falling down one due to some bad pg. I then try the solution describ by our grec coleagues https://blog.noc.grnet.gr/2016/10/18/surviving-a-ceph-cluster-outage-the-hard-way/ So i put a set noout and noscrub nodeep-scrub to osd that seem to freeze the situation. The cluster is only used to provide rbd disk to our cloud-compute and cloud-storage solution and to our internal kvm vm It seem that only some pool are affected by unclean/unknown/unfound object And all is working well for other pool ( may be some speed issue ) I can confirm that data on affected pool are completly corrupted. You can find here https://filesender.belnet.be/?s=download&token=1fac6b04-dd35-46f7-b4a8-c851cfa06379 a tgz file with a maximum information i can dump to give an overview of the current state of the cluster. So i have 2 questions. Does removing affected pools w with stuck pg associated will remove the deffect pg ? If not i am completly lost and will like to know if somes expert can assist us even not for free. If yes you can contact me by mail at philippe@xxxxxxxxx. _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com