Good news... :) After I tried everything. I decide to re-create my MONs from OSD's and I used the script: https://paste.ubuntu.com/p/rNMPdMPhT5/ And it worked!!! I think when 2 server crashed and come back same time some how MON's confused and the maps just corrupted. After re-creation all the MONs was have the same map so it worked. But still I dont know how to hell the mons can cause endless %95 I/O ??? This a bug anyway and if you dont want to leave the problem then do not "enable" your mons. Just start them manual! Another tough lesson. ceph -s: https://paste.ubuntu.com/p/m3hFF22jM9/ As you can see below some of the OSDs are still down. And when I start them they dont start. Check start log: https://paste.ubuntu.com/p/ZJQG4khdbx/ Debug log: https://paste.ubuntu.com/p/J3JyGShHym/ What we can do for the problem? What is the cause of the problem? Thank you everyone. You helped me a lot! :) > > I think I might find something. > When I start an OSD its making High I/O around %95 and the other OSDs > are also triggered and altogether they make same the I/O. This is true > even if when I set noup flag. So all the OSDs are making high I/O when > ever an OSD starts. > > I think this is too much. I have 168 OSD and when I start them OSD I/O > job never finishes. I let the cluster for 70 hours and the high I/O > never finished at all. > > We're trying to start OSD's host by host and wait for settlement but > it takes too much time. > OSD can not even answer "ceph tell osd.158 version". So if it becomes > so busy and this seems to be a loop since another OSD startup triggers > other OSD I/O. > > So I debug it and I hope this can be examined. > > This is debug=20 OSD log : > Full log: https://www.dropbox.com/s/pwzqeajlsdwaoi1/ceph-osd.90.log?dl=0 > Less log: Only the last part before the high I/O is finished: > https://paste.ubuntu.com/p/7ZfwH8CBC5/ > Strace -f -P osd; > - When I start the osd: https://paste.ubuntu.com/p/8n2kTvwnG6/ > - After I/O is finished: https://paste.ubuntu.com/p/4sGfj7Bf4c/ > > Now some people in IRC says this is a bug, try Ubuntu and new Ceph > repo maybe it will help. I agree with them and I will give a shot. > What do you think? > by morphin <morphinwithyou@xxxxxxxxx>, 27 Eyl 2018 Per, 16:27 > tarihinde şunu yazdı: > > > > I should not have client I/O right now. All of my VMs are down right > > now. There is only a single pool. > > > > Here is my crush map: https://paste.ubuntu.com/p/Z9G5hSdqCR/ > > > > Cluster does not recover. After starting OSDs with the specified > > flags, OSD up count drops from 168 to 50 with in 24 hours. > > Stefan Kooman <stefan@xxxxxx>, 27 Eyl 2018 Per, 16:10 tarihinde şunu yazdı: > > > > > > Quoting by morphin (morphinwithyou@xxxxxxxxx): > > > > After 72 hours I believe we may hit a bug. Any help would be greatly > > > > appreciated. > > > > > > Is it feasible for you to stop all client IO to the Ceph cluster? At > > > least until it stabilizes again. "ceph osd pause" would do the trick > > > (ceph osd unpause would unset it). > > > > > > What kind of workload are you running on the cluster? How does your > > > crush map looks like (ceph osd getcrushmap -o /tmp/crush_raw; > > > crushtool -d /tmp/crush_raw -o /tmp/crush_edit)? > > > > > > I have seen a (test) Ceph cluster "healing" itself to the point there was > > > nothing left to recover on. In *that* case the disks were overbooked > > > (multiple OSDs per physical disk) ... The flags you set (nooout, nodown, > > > nobackfill, norecover, noscrub, etc., etc.) helped to get it to recover > > > again. I would try to get all OSDs online again (and manually keep them > > > up / restart them, because you have set nodown). > > > > > > Does the cluster recover at all? > > > > > > Gr. Stefan > > > > > > -- > > > | BIT BV http://www.bit.nl/ Kamer van Koophandel 09090351 > > > | GPG: 0xD14839C6 +31 318 648 688 / info@xxxxxx